ablab / quast

Genome assembly evaluation tool
http://quast.sf.net
Other
403 stars 78 forks source link

Upper limit to reference sequences #7

Closed shaman-narayanasamy closed 8 years ago

shaman-narayanasamy commented 9 years ago

Dear authors,

I would like to know if there is an upper limit to the reference genomes, when trying to validate a metagenome. I have 73 genome sequences stored in separate files (multi fasta). I appended the list of genomes into a the command (coma separated), but quast warns that there are no similarities between the query and reference, but this is not actually the case because when I reduce the number of references to two, it seems to work.

I issued the command:

SIM_REF=`\ls /mnt/nfs/projects/ecosystem_biology/test_datasets/CelajEtAl/73_species/*.fa | paste -s -d,`

metaquast.py -o /scratch/users/snarayanasamy/IMP_MS_data/quast_simDat -R ${SIM_REF} -t 12 -l IMP,metAmos /scratch/users/snarayanasamy/IMP_MS_data/IMP/simulated_data_output/Assembly/MGMT.assembly.merged.fa /scratch/users/snarayanasamy/IMP_MS_data/metAmosAnalysis/simDat_metAmos/Assemble/out/soapdenovo.31.asm.contig

And obtained the following stderr/stdout:

Partitioning contigs into bins aligned to each reference..
  processing IMP
  processing metAmos

No contigs were aligned to the reference Bacteroides_finegoldii_DSM_17565, skipping..

No contigs were aligned to the reference Eubacterium_siraeum_DSM_15702, skipping..

No contigs were aligned to the reference Bacteroides_ovatus_ATCC_8483, skipping..

No contigs were aligned to the reference Bacteroides_stercoris_ATCC_43183, skipping..

No contigs were aligned to the reference Alistipes_putredinis_DSM_17216, skipping..

No contigs were aligned to the reference Bacteroides_spDOT_4_3_47FAA, skipping..

No contigs were aligned to the reference Collinsella_aerofaciens_ATCC_25986, skipping..

No contigs were aligned to the reference Bacteroides_fragilis_3_1_12, skipping..

No contigs were aligned to the reference Bacteroides_dorei_DSM_17855, skipping..

Starting quast.py for the contigs aligned to Eubacterium_dolichum_DSM_3991
(logging to /scratch/users/snarayanasamy/IMP_MS_data/quast_simDat/Eubacterium_dolichum_DSM_3991_quast_output/quast.log)

No contigs were aligned to the reference Blautia_hydrogenotrophica_DSM_10507, skipping..

Notice that quast runs for the genome "Eubacterium_dolichum_DSM_3991", this occurs because I ran the analysis previously and overwrote the output directory, such that the nucmer files corresponding to that particular genome is retained, hence quast is able to access it and perform the analysis.

Is there a workaround or a better way to do this? I am guessing that the list of genomes (absolute paths) is too long for the command. Please let me know if you need more information. I look forward to your response.

Update: I tried up to 27 references, and it works. I am slowly increasing it to see where this problem occurs. Still not sure what the issue is...

Update 2: I iteratively ran quast and it seems that it fails when I provide 40 reference files... It works up to 39.

-Shaman-

shaman-narayanasamy commented 9 years ago

Hi,

Is there any news regarding my issue?

I seem to be able to run metaQUAST by sub-partitioning the number of references. Looks like it is able to work this way. Is there a way for me combine all the results of the alignment to obtain a summary report? Since I already have all the nucmer alignments to the reference genomes.

Let me know what you think.