ablab / quast

Genome assembly evaluation tool
http://quast.sf.net
Other
392 stars 76 forks source link

BUSCO problem: 100% Missing BUSCOs #139

Open ghost opened 4 years ago

ghost commented 4 years ago

Dear Developers,

We are trying to perform a BUSCO analysis in our assembly on a CentOS7 computer. While the Quast analysis is running successfully, we get 100% missing BUSCOs, even if we try the assessment on the reference genome which normally is ~95%-98%. The augustus.log displays the following message multiple times:

/home/mgabriel/.quast/augustus3.2.3/bin/augustus: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/mgabriel/.quast/augustus3.
2.3/bin/augustus)
/home/mgabriel/.quast/augustus3.2.3/bin/augustus: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/mgabriel/.quast/augustus3.
2.3/bin/augustus)
/home/mgabriel/.quast/augustus3.2.3/bin/augustus: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/mgabriel/.quast/augustus3.
2.3/bin/augustus)
...
...
...

The content of the run_assembly.log from the busco_stats directory, after the end of the assessment is the following:

[hmmsearch]     Error: Failed to open sequence file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370P9T.faa.2 for reading
[hmmsearch]     Error: Failed to open sequence file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370ARO.faa.1 for reading
...
...
...
9 of 181 task(s) completed at 04/28/2020 14:27:25
[hmmsearch]     109 of 181 task(s) completed at 04/28/2020 14:27:26
[hmmsearch]     181 of 181 task(s) completed at 04/28/2020 14:27:26
Results:
C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:303
0 Complete BUSCOs (C)
0 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
0 Fragmented BUSCOs (F)
303 Missing BUSCOs (M)
303 Total BUSCO groups searched
BUSCO did not find any match. Do not forget to check the file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/augustus.log to exclude a problem regarding Augustus
[bash]  rm: cannot remove '/home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/tmp/temp_GCA_007989325-1_vir160_genomic_1730718755': No such file or directory
BUSCO analysis done with WARNING(s). Total running time: 1649.43617296 seconds
Results written in /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/

ADS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370ER5.faa.1 for reading
[hmmsearch]     Error: Failed to open sequence file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370AJP.faa.2 for reading
[hmmsearch]     Error: Failed to open sequence file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370VTP.faa.1 for reading
...
...
...

Also the run_assembly.log before its final form:

****************** Start a BUSCO 3.0.2 analysis, current time: 04/28/2020 19:12:11 ******************
Configuration loaded from /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/config.ini
Init tools...
Check dependencies...
Check input file...
To reproduce this run: python /opt/quast-quast_5.0.2/quast.py -i /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/quast_corrected_input/GCA_0079
89325_1_vir160_genomic.fna -o GCA_007989325-1_vir160_genomic -l /home/mgabriel/.quast/busco/eukaryota/ -m genome -c 16 -t /home/mgabriel/Downloads/data/d
ro_vir_READS/quast_results/busco_stats/tmp/ -sp fly --augustus_parameters ''''
Mode is: genome
The lineage dataset is: eukaryota_odb9 (eukaryota)
Delete the current result folder and start a new run
Temp directory is /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/tmp/
****** Phase 1 of 2, initial predictions ******
****** Step 1/3, current time: 04/28/2020 19:12:13 ******
Create blast database...
[makeblastdb]   Building a new DB, current time: 04/28/2020 19:12:13
[makeblastdb]   New DB name:   /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/tmp/GCA_007989325-1_vir160_genomic_1658005936
[makeblastdb]   New DB title:  /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/quast_corrected_input/GCA_007989325_1_vir160_genomic.fna
[makeblastdb]   Sequence type: Nucleotide
[makeblastdb]   Keep MBits: T
[makeblastdb]   Maximum file size: 1000000000B
[makeblastdb]   Adding sequences from FASTA; added 27 sequences in 2.76254 seconds.
[makeblastdb]   1 of 1 task(s) completed at 04/28/2020 19:12:16
Running tblastn, writing output to /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/blast_output/
tblastn_GCA_007989325-1_vir160_genomic.tsv...
[tblastn]       1 of 1 task(s) completed at 04/28/2020 19:14:48
****** Step 2/3, current time: 04/28/2020 19:14:48 ******
Maximum number of candidate contig per BUSCO limited to: 3
Getting coordinates for candidate regions...
Pre-Augustus scaffold extraction...
Running Augustus prediction using fly as species:
Additional parameters for Augustus are '':
[augustus]      Please find all logs related to Augustus errors here: /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_00798
9325-1_vir160_genomic/augustus_output/augustus.log
[augustus]      135 of 337 task(s) completed at 04/28/2020 19:14:51
[augustus]      337 of 337 task(s) completed at 04/28/2020 19:14:52
Extracting predicted proteins...
****** Step 3/3, current time: 04/28/2020 19:14:57 ******
Running HMMER to confirm orthology of predicted proteins:
Results:
C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:303
0 Complete BUSCOs (C)
0 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
0 Fragmented BUSCOs (F)
303 Missing BUSCOs (M)
303 Total BUSCO groups searched
****** Phase 2 of 2, predictions using species specific training ******
****** Step 1/3, current time: 04/28/2020 19:14:57 ******
Extracting missing and fragmented buscos from the ancestral_variants file...
Running tblastn, writing output to /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/blast_output/
tblastn_GCA_007989325-1_vir160_genomic_missing_and_frag_rerun.tsv...

After some research, I found that others came across the same problem when using BUSCO and the problem is fixed in the new BUSCO version (v4). In Ubuntu, we didn't encounter a similar problem.

Do you have any suggestions on what can we do to fix this? Thank you in advance!

sixvable commented 3 years ago

I have faced same error. With --debug mode I found the dynamic library libstdc++.so.6 used in augustus are system library(locate in /usr/lib64/libstdc++.so.6), which was old. I think the code in busco mode should modified to use the conda-install library (locate in `conda/path/envs/envs-name/lib/libstdc++.so.6.*****)