--init M9 yields no output SOMETIMES

rdmtinez commented 5 years ago

Greetings Daniel,

I have been creating some GEMs using carve in the following ways:

carve -r -v --dna ./*.fna --fbc2 -o ./output_dir i.e. NO defined medium and carve -r -v --dna ./*.fna --init M9 --fbc2 -o ./output_dir/ and carve -r -v --dna ./*.fna --init OWNMEDIUM --fbc2 -o ./output_dir/

Of the 200+ assemblies I'm using, only 178 are able to generate an output... the discrepancy here seems to be that the sequences are just not found in the database or the .fna file contain too few sequences to create anything meaningful. This has prompted me to expect that carve will only output 178 drafts--no problem here for now.

However, the --init M9 option is only producing 166 drafts, while all the other --init [x] runs are producing the expected 178, even the one with no defined medium. Of note is the fact that the different sets of 178 GEMs are the same sets are the same (files not content), while the 166 set has 1 output more that the others do not have, yet it is missing 13 that the others have.

I get the following output: Running diamond... diamond blastx -d /home/martinez/anaconda3/lib/python3.6/site-packages/carveme/data/input/bigg_proteins.dmnd -q /home/martinez/projects/data/lj_sphere/assemblies/LjNodule209.fna -o /home/martinez/projects/data/lj_sphere/assemblies/LjNodule209.tsv --more-sensitive --top 10 Loading universe model... Loading media library... Scoring reactions... Reconstructing a single model Done.

I'm having trouble deducing what this can be attributed to especially since the default settings also produce the same 178 GEMs, and verbose mode doesn't seem to produce an issue. Would you happen to know what might be going awry?

Thanks for your help.

Best regards,

Ricardo Martinez

cdanielmachado commented 5 years ago

Hi Ricardo,

Can you share the .fna file for one of the 13 models that failed? I will try to debug and see what is going on.

rdmtinez commented 5 years ago

Hi Daniel, after a bit more digging it seems that the problem is not with CARVE alone, but somehow there is miscommunication between LSF and carve when the jobs are being processed. There must be an underlying reason for the strange behavior given that it is always the same 13 files that do not finish processing (along with the other 33 other 'small' fna files) when submitted to our 'superheavy' queue (the only queue that isn't terminated automatically when running carve -r...).

I ran the 13 files through a for loop instead, and the output was created this time using --init M9. I'm going to do the same for the other 33 small fna and see if anything changes first. I'll let you know how that goes, but I'm beginning to think it's just our cluster though not playing nice :-/

cdanielmachado commented 5 years ago

In our cluster, I realized that using job arrays is more convenient than using the "-r" option. I think LSF also supports job arrays. Basically, you submit your script as a job array (one reconstruction per job, using the job array index to select the genome file), and the cluster does all the work of managing the parallel processes (instead of that work being done by the python multiprocessing library when you use the "-r" option).

rdmtinez commented 5 years ago

I will keep that in mind for the next reconstructions, Thanks!

cdanielmachado / carveme

--init M9 yields no output SOMETIMES #38