Closed tkcaccia closed 3 months ago
what does your config look like? are all databases downloaded?
I realized the script lib_downloader.py did not download all libraries. So I downloaded again the missing one.
The output showed that all libraries were found:
UniVec_Core found! using: /scratch/alphafold/MetaPro/univec_core/UniVec_Core.fasta Adapter found! using: /scratch/alphafold/MetaPro/trimmomatic_adapters/TruSeq3-PE-2.fa Host found! using: /scratch/alphafold/MetaPro/human_genome/human_genome.fasta Rfam found! using: /scratch/alphafold/MetaPro/Rfam/Rfam.cm DNA_DB found! using: /scratch/alphafold/MetaPro/family_group source_taxa_db no inner section found. using default /project/j/jparkin/Lab_Databases/family_llbs Prot_DB found! using: /scratch/alphafold/MetaPro/nr/nr Prot_DB_reads found! using: /scratch/alphafold/MetaPro/nr/nr accession2taxid found! using: /scratch/alphafold/MetaPro/accession2taxid/accession2taxid nodes found! using: /scratch/alphafold/MetaPro/WEVOTE_db/nodes_wevote.dmp names found! using: /scratch/alphafold/MetaPro/WEVOTE_db/names_wevote.dmp Kaiju_db found! using: /scratch/alphafold/MetaPro/kaiju_db/kaiju_db_nr.fmi Centrifuge_db found! using: /scratch/alphafold/MetaPro/centrifuge_db/nt SWISS_PROT found! using: /scratch/alphafold/MetaPro/swiss_prot_db/swiss_prot_db SWISS_PROT_map found! using: /scratch/alphafold/MetaPro/swiss_prot_db/SwissProt_EC_Mapping.tsv PriamDB found! using: /scratch/alphafold/MetaPro/PRIAM_db/ DetectDB found! using: /scratch/alphafold/MetaPro/DETECTv2 WEVOTEDB found! using: /scratch/alphafold/MetaPro/WEVOTE_db/ EC_pathway found! using: /scratch/alphafold/MetaPro/EC_pathway/EC_pathway.txt path_to_superpath found! using: /scratch/alphafold/MetaPro/path_to_superpath/pathway_to_superpathway.csv MetaGeneMark_model found! using: /pipeline_tools/mgm/MetaGeneMark_v1.mod enzyme_db no inner section found. using default /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt taxid_tree found! using: /scratch/alphafold/MetaPro/taxid_trees/class_tree.tsv kraken2_db found! using: /scratch/alphafold/MetaPro/kraken2_db
The pipeline stopped at GA_split but I noted the results folder was empty in GA_pre_scan, so I manually removed these folders and remove GA_split and GA_pre_scan from bypass_log.txt
How can I identify where is the problem?
if you need to dive into the code, all steps create a shellscript for their specific section. you could run the shellscript for that step manually to see where the system is stalling.
The script does not stall. No FASTA files are produced in GA_pre_scan
so, the config says it can't find your source taxa db. GA_pre_scan relies on these taxid trees we made: https://compsysbio.org/metapro_libs/taxid_trees/ These trees link every taxa found in chocophlan to their higher-order rollups.
Your run is missing these tables.
Hi billytaj, I am having the same issue. First, I was having only the class_tsv, but from your reply to the above I get the other tax tree files. However, the pipeline still ended with the error ~/Outs/GA_pre_scan/final_results 2024-06-18 04:50:47.953054 Error: no fasta files found. BWA only accepts .fasta extensions empty BWA database. tkcaccia, did you resolve the problem? Thanks
this error is a warning that the pre-scan didn't function properly.
it's supposed to taxa-scan your cleaned reads and populate a customized subset of the chocophlan database.
There's ways to bypass it if you want.
Could you point to how we can bypassed that's Thank
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Billy Taj @.> Sent: Wednesday, June 26, 2024 11:31:59 AM To: ParkinsonLab/MetaPro @.> Cc: Agany, Diing @.>; Comment @.> Subject: Re: [ParkinsonLab/MetaPro] GA_pre_scan results folder empty (Issue #24)
You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification
this error is a warning that the pre-scan didn't function properly. it's supposed to taxa-scan your cleaned reads and populate a customized subset of the chocophlan database. There's ways to bypass it if you want.
— Reply to this email directly, view it on GitHubhttps://github.com/ParkinsonLab/MetaPro/issues/24#issuecomment-2192143934, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATSTELNB6W7HOMFRGNUQ2Q3ZJLUH7AVCNFSM6AAAAABGNCYCTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJSGE2DGOJTGQ. You are receiving this because you commented.Message ID: @.***>
in your config, under the Databases heading,
Add in
DNA_DB_override = True
I have some issues with completing the pipeline. The step GA_pre_scan does not produce any files in the final_resultss folder and then the pipeline stops in the GA_split step. can you please help me to identify the error?
Here below the output.
2024-04-18 09:01:21.514892 continuing from: assemble_contigs 2024-04-18 09:01:21.518869 running: GA_pre_scan 2024-04-18 09:01:21.548006 mp_ta_kraken2_singletons job submitted. mem: 375.48778515625 GB^M2024-04-18 09:01:21.560298 mp_ta_kraken2_paired job submitted. mem: 375.4842890625 GB^MKraken2 on singletons Kraken2 on paired 2024-04-18 09:01:21.573631 mp_ta_kraken2_contigs job submitted. mem: 375.4856484375 GB^MGA_pre_scan/data/jobs/mp_ta_centrifuge_reads Kraken2 on contigs 2024-04-18 09:01:21.600288 mp_ta_centrifuge_reads job submitted. mem: 375.482765625 GB^MGA_pre_scan/data/jobs/mp_ta_centrifuge_contigs centrifuge on reads Loading database information...Loading database information...Loading database information...centrifuge on contigs done. done. done. 15475 sequences (12.36 Mbp) processed in 0.600s (1547.2 Kseq/m, 1235.55 Mbp/m). 15377 sequences classified (99.37%) 98 sequences unclassified (0.63%) 41252 sequences (12.02 Mbp) processed in 0.774s (3195.9 Kseq/m, 931.33 Mbp/m). 40639 sequences classified (98.51%) 613 sequences unclassified (1.49%) 677476 sequences (78.17 Mbp) processed in 0.870s (46697.5 Kseq/m, 5388.09 Mbp/m). 628361 sequences classified (92.75%) 49115 sequences unclassified (7.25%) report file /scratch/t0065634/Microbiome/output_batch2/LPC0010_S8/GA_pre_scan/data/2_centrifuge/raw_contigs.txt Number of iterations in EM algorithm: 4 Probability diff. (P - P_prev) in the last iteration: 3.70532e-11 Calculating abundance: 00:00:00 report file /scratch/t0065634/Microbiome/output_batch2/LPC0010_S8/GA_pre_scan/data/2_centrifuge/reads.txt Number of iterations in EM algorithm: 13 Probability diff. (P - P_prev) in the last iteration: 8.45475e-11 Calculating abundance: 00:00:00 2024-04-18 09:01:21.618062 mp_ta_centrifuge_contigs job submitted. mem: 375.47983984375 GB^M2024-04-18 09:01:21.619364 closing down processes: 5 2024-04-18 09:01:21.619401 closed down: 0/5 ^M2024-04-18 09:03:09.809845 closed down: 1/5 ^M2024-04-18 09:03:09.809963 closed down: 2/5 ^M2024-04-18 09:03:09.810030 closed down: 3/5 ^M2024-04-18 09:13:37.616210 closed down: 4/5 ^Mmerging kraken2 reports 2024-04-18 09:13:37.622425 TA_kraken2_pp job submitted. mem: 375.4827734375 GB^M2024-04-18 09:13:37.623675 closing down processes: 1 2024-04-18 09:13:37.623712 closed down: 0/1 ^Mcombining all centrifuge results 2024-04-18 09:13:37.938608 TA_centrifuge_pp job submitted. mem: 375.48255078125 GB^M2024-04-18 09:13:37.940008 closing down processes: 1 2024-04-18 09:13:37.940046 closed down: 0/1 ^Mcombining classification outputs for wevote Running WEVOTE gathering WEVOTE results 2024-04-18 09:13:38.094341 TA_wevote_combine job submitted. mem: 375.48346484375 GB^M2024-04-18 09:13:38.095641 running: TA_wevote_combine 2024-04-18 09:13:38.095690 closing down processes: 1 2024-04-18 09:13:38.095718 closed down: 0/1 ^MGA pre-scan get libs 2024-04-18 09:15:58.784956 ga_collect_db job submitted. mem: 375.4834921875 GB^M2024-04-18 09:15:58.786435 running: ga_collect_db 2024-04-18 09:15:58.786477 closing down processes: 1 2024-04-18 09:15:58.786506 closed down: 0/1 ^MGA assemble libs 2024-04-18 09:16:08.826043 ga_assemble_db job submitted. mem: 375.48344140625 GB^M2024-04-18 09:16:08.827014 running: ga_assemble_db 2024-04-18 09:16:08.827046 closing down processes: 1 2024-04-18 09:16:08.827063 closed down: 0/1 ^M2024-04-18 09:16:08.934087 continuing from: GA_pre_scan 2024-04-18 09:16:08.938664 running: GA_split 2024-04-18 09:16:08.938700 splitting contigs splitting fasta for contigs splitting fastq for singletons GA splitting fastq for pair_1 GA splitting fastq for pair_2 GA 2024-04-18 09:16:09.008651 closing down processes: 4 2024-04-18 09:16:09.008748 closed down: 0/4 ^M2024-04-18 09:16:11.656524 closed down: 1/4 ^M2024-04-18 09:16:11.656631 closed down: 2/4 ^M2024-04-18 09:16:11.656673 closed down: 3/4 ^M2024-04-18 09:16:13.681369 continuing from: GA_split 2024-04-18 09:16:13.681450 Running GA lib check 2024-04-18 09:16:13.681531 BWA DB check: /scratch/t0065634/Microbiome/output_batch2//LPC0010_S8/GA_pre_scan/final_results 2024-04-18 09:16:13.686604 Error: no fasta files found. BWA only accepts .fasta extensions empty BWA database