Closed tkcaccia closed 4 months ago
what does your config look like? are all databases downloaded?
I realized the script lib_downloader.py did not download all libraries. So I downloaded again the missing one.
The output showed that all libraries were found:
UniVec_Core found! using: /scratch/alphafold/MetaPro/univec_core/UniVec_Core.fasta Adapter found! using: /scratch/alphafold/MetaPro/trimmomatic_adapters/TruSeq3-PE-2.fa Host found! using: /scratch/alphafold/MetaPro/human_genome/human_genome.fasta Rfam found! using: /scratch/alphafold/MetaPro/Rfam/Rfam.cm DNA_DB found! using: /scratch/alphafold/MetaPro/family_group source_taxa_db no inner section found. using default /project/j/jparkin/Lab_Databases/family_llbs Prot_DB found! using: /scratch/alphafold/MetaPro/nr/nr Prot_DB_reads found! using: /scratch/alphafold/MetaPro/nr/nr accession2taxid found! using: /scratch/alphafold/MetaPro/accession2taxid/accession2taxid nodes found! using: /scratch/alphafold/MetaPro/WEVOTE_db/nodes_wevote.dmp names found! using: /scratch/alphafold/MetaPro/WEVOTE_db/names_wevote.dmp Kaiju_db found! using: /scratch/alphafold/MetaPro/kaiju_db/kaiju_db_nr.fmi Centrifuge_db found! using: /scratch/alphafold/MetaPro/centrifuge_db/nt SWISS_PROT found! using: /scratch/alphafold/MetaPro/swiss_prot_db/swiss_prot_db SWISS_PROT_map found! using: /scratch/alphafold/MetaPro/swiss_prot_db/SwissProt_EC_Mapping.tsv PriamDB found! using: /scratch/alphafold/MetaPro/PRIAM_db/ DetectDB found! using: /scratch/alphafold/MetaPro/DETECTv2 WEVOTEDB found! using: /scratch/alphafold/MetaPro/WEVOTE_db/ EC_pathway found! using: /scratch/alphafold/MetaPro/EC_pathway/EC_pathway.txt path_to_superpath found! using: /scratch/alphafold/MetaPro/path_to_superpath/pathway_to_superpathway.csv MetaGeneMark_model found! using: /pipeline_tools/mgm/MetaGeneMark_v1.mod enzyme_db no inner section found. using default /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt taxid_tree found! using: /scratch/alphafold/MetaPro/taxid_trees/class_tree.tsv kraken2_db found! using: /scratch/alphafold/MetaPro/kraken2_db
The pipeline stopped at GA_split but I noted the results folder was empty in GA_pre_scan, so I manually removed these folders and remove GA_split and GA_pre_scan from bypass_log.txt
How can I identify where is the problem?
if you need to dive into the code, all steps create a shellscript for their specific section. you could run the shellscript for that step manually to see where the system is stalling.
The script does not stall. No FASTA files are produced in GA_pre_scan
so, the config says it can't find your source taxa db. GA_pre_scan relies on these taxid trees we made: https://compsysbio.org/metapro_libs/taxid_trees/ These trees link every taxa found in chocophlan to their higher-order rollups.
Your run is missing these tables.
Hi billytaj, I am having the same issue. First, I was having only the class_tsv, but from your reply to the above I get the other tax tree files. However, the pipeline still ended with the error ~/Outs/GA_pre_scan/final_results 2024-06-18 04:50:47.953054 Error: no fasta files found. BWA only accepts .fasta extensions empty BWA database. tkcaccia, did you resolve the problem? Thanks
this error is a warning that the pre-scan didn't function properly.
it's supposed to taxa-scan your cleaned reads and populate a customized subset of the chocophlan database.
There's ways to bypass it if you want.
Could you point to how we can bypassed that's Thank
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Billy Taj @.> Sent: Wednesday, June 26, 2024 11:31:59 AM To: ParkinsonLab/MetaPro @.> Cc: Agany, Diing @.>; Comment @.> Subject: Re: [ParkinsonLab/MetaPro] GA_pre_scan results folder empty (Issue #24)
You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification
this error is a warning that the pre-scan didn't function properly. it's supposed to taxa-scan your cleaned reads and populate a customized subset of the chocophlan database. There's ways to bypass it if you want.
— Reply to this email directly, view it on GitHubhttps://github.com/ParkinsonLab/MetaPro/issues/24#issuecomment-2192143934, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATSTELNB6W7HOMFRGNUQ2Q3ZJLUH7AVCNFSM6AAAAABGNCYCTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJSGE2DGOJTGQ. You are receiving this because you commented.Message ID: @.***>
in your config, under the Databases heading,
Add in
DNA_DB_override = True
Hi Billy,
I'm running into the same issue and DNA_DB_override = True
did not work.
Here is my config:
[Databases]
DNA_DB_override = True
database_path: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases
UniVec_Core: %(database_path)s/UniVec/UniVec_Core.fasta
Adapter: %(database_path)s/Trimmomatic/Nextseq_PE.fa
Host: %(database_path)s/host-sequence/Mus_musculus.GRCm39.cds.all.fa
Rfam: %(database_path)s/prebuilt/Rfam/Rfam.cm
DNA_DB: %(database_path)s/prebuilt/choco_h3_genus/genus_group
source_taxa_db: %(database_path)s/prebuilt/choco_h3_genus/genus_group\
#YOU NEED THE ABOVE LINE
#DNA_DB: %(database_path)s/chocophlan_h3_chunks
#DNA_DB: /home/billy/choco_h3/choco_h3_group
#DNA_DB_Split: %(database_path)s/ChocoPhlAn/ChocoPhlAn_split/
Prot_DB: %(database_path)s/prebuilt/nr/nr
Prot_DB_reads: %(database_path)s/prebuilt/nr/nr
accession2taxid: %(database_path)s/accession2taxid
nodes: %(database_path)s/prebuilt/WEVOTE_db/nodes_wevote.dmp
names: %(database_path)s/prebuilt/WEVOTE_db/names_wevote.dmp
Kaiju_db: %(database_path)s/prebuilt/kaiju_db/kaiju_db_nr.fmi
Centrifuge_db: %(database_path)s/centrifuge/nt
SWISS_PROT: %(database_path)s/prebuilt/swiss_prot_db
SWISS_PROT_map: %(database_path)s/prebuilt/swiss_prot_db/SwissProt_EC_Mapping.tsv
PriamDB: %(database_path)s/prebuilt/PRIAM_db
DetectDB: %(database_path)s/prebuilt/DETECTv2
WEVOTEDB: %(database_path)s/prebuilt/WEVOTE_db/
EC_pathway: %(database_path)s/prebuilt/EC_pathway/EC_pathway.txt
path_to_superpath: %(database_path)s/prebuilt/pathway_to_superpathway.csv
MetaGeneMark_model: /pipeline_tools/mgm/MetaGeneMark_v1.mod
taxid_tree: %(database_path)s/prebuilt/taxid_trees/genus_tree.tsv
kraken2_db: %(database_path)s/prebuilt/kraken2_db
#[code]
#ga_pre_scan_get_libs = /home/billy/human_flu/30785/ga_pre_scan_get_libs.py
This is the output:
USING CONFIG /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/config.ini
target_rank no inner section found. using default genus
AdapterRemoval_minlength found! using: 30
Show_unclassified found! using: Yes
bypass_log_name no inner section found. using default bypass_log.txt
debug_stop_flag no inner section found. using default none
num_threads no inner section found. using default 32
taxa_existence_cutoff no inner section found. using default 0.1
DNA_DB_mode no inner section found. using default chocophlan
RPKM_cutoff found! using: 0.005
BWA_cigar_cutoff found! using: 90
BLAT_identity_cutoff found! using: 75
BLAT_length_cutoff found! using: 0.5
BLAT_score_cutoff found! using: 60
DIAMOND_identity_cutoff found! using: 75
DIAMOND_length_cutoff found! using: 0.5
DIAMOND_score_cutoff found! using: 60
BWA_mem_footprint no inner section found. using default 5
BLAT_mem_footprint no inner section found. using default 5
DMD_mem_footprint no inner section found. using default 10
BWA_mem_threshold found! using: 75
BLAT_mem_threshold found! using: 75
DIAMOND_mem_threshold found! using: 80
DETECT_mem_threshold found! using: 80
Infernal_mem_threshold found! using: 75
Barrnap_mem_threshold found! using: 75
BWA_pp_mem_threshold found! using: 30
BLAT_pp_mem_threshold found! using: 75
DIAMOND_pp_mem_threshold found! using: 80
GA_final_merge_mem_threshold no inner section found. using default 5
TA_mem_threshold found! using: 80
repop_mem_threshold no inner section found. using default 50
EC_mem_threshold no inner section found. using default 5
BWA_job_limit found! using: 32
BLAT_job_limit found! using: 32
DIAMOND_job_limit found! using: 32
DETECT_job_limit found! using: 32
Infernal_job_limit found! using: 32
Barrnap_job_limit found! using: 32
BWA_pp_job_limit found! using: 32
BLAT_pp_job_limit found! using: 32
DIAMOND_pp_job_limit found! using: 32
GA_final_merge_job_limit no inner section found. using default 24
TA_job_limit no inner section found. using default 24
repop_job_limit no inner section found. using default 1
EC_job_limit no inner section found. using default 24
Infernal_job_delay no inner section found. using default 5
Barrnap_job_delay no inner section found. using default 5
BWA_job_delay found! using: 0.5
BLAT_job_delay found! using: 5
DIAMOND_job_delay found! using: 5
DETECT_job_delay no inner section found. using default 5
BWA_pp_job_delay found! using: 0.01
BLAT_pp_job_delay found! using: 0.05
DIAMOND_pp_job_delay found! using: 5
GA_final_merge_job_delay no inner section found. using default 5
TA_job_delay found! using: 10
repop_job_delay no inner section found. using default 10
EC_job_delay no inner section found. using default 1
keep_all found! using: yes
keep_quality found! using: no
keep_host found! using: no
keep_vector found! using: no
keep_rRNA found! using: no
keep_repop found! using: no
keep_assemble_contigs found! using: yes
keep_GA_BWA found! using: no
keep_GA_BLAT found! using: no
keep_GA_DIAMOND found! using: no
keep_GA_final found! using: no
keep_TA found! using: no
keep_EC found! using: no
keep_outputs found! using: no
filter_stringency found! using: high
GA_chunk_size found! using: 10000
EC_chunk_size found! using: 1000
rRNA_chunk_size found! using: 50000
Labels no section found, using default: quality_filter
Labels no section found, using default: host_filter
Labels no section found, using default: vector_filter
Labels no section found, using default: rRNA_filter
Labels no section found, using default: rRNA_filter_split
Labels no section found, using default: rRNA_filter_convert
Labels no section found, using default: rRNA_filter_barrnap
Labels no section found, using default: rRNA_filter_barrnap_merge
Labels no section found, using default: rRNA_filter_barrnap_pp
Labels no section found, using default: rRNA_filter_infernal
Labels no section found, using default: rRNA_filter_infernal_prep
Labels no section found, using default: rRNA_filter_splitter
Labels no section found, using default: rRNA_filter_post
Labels no section found, using default: duplicate_repopulation
Labels no section found, using default: assemble_contigs
Labels no section found, using default: destroy_contigs
Labels no section found, using default: GA_pre_scan
Labels no section found, using default: GA_split
Labels no section found, using default: GA_BWA
Labels no section found, using default: GA_BWA_pp
Labels no section found, using default: GA_BWA_merge
Labels no section found, using default: GA_BLAT
Labels no section found, using default: GA_BLAT_cleanup
Labels no section found, using default: GA_BLAT_cat
Labels no section found, using default: GA_BLAT_pp
Labels no section found, using default: GA_BLAT_merge
Labels no section found, using default: GA_DMD
Labels no section found, using default: GA_DMD_pp
Labels no section found, using default: GA_final_merge
Labels no section found, using default: taxonomic_annotation
Labels no section found, using default: enzyme_annotation
Labels no section found, using default: enzyme_annotation_detect
Labels no section found, using default: enzyme_annotation_priam
Labels no section found, using default: enzyme_annotation_priam_split
Labels no section found, using default: enzyme_annotation_priam_cat
Labels no section found, using default: enzyme_annotation_DMD
Labels no section found, using default: enzyme_annotation_pp
Labels no section found, using default: outputs
Labels no section found, using default: output_copy_gene_map
Labels no section found, using default: output_clean_ec
Labels no section found, using default: output_copy_taxa
Labels no section found, using default: output_network_generation
Labels no section found, using default: output_unique_hosts_singletons
Labels no section found, using default: output_unique_hosts_pair_1
Labels no section found, using default: output_unique_hosts_pair_2
Labels no section found, using default: output_unique_vectors_singletons
Labels no section found, using default: output_unique_vectors_pair_1
Labels no section found, using default: output_unique_vectors_pair_2
Labels no section found, using default: output_combine_hosts
Labels no section found, using default: output_per_read_scores
Labels no section found, using default: output_contig_stats
Labels no section found, using default: output_ec_heatmap
Labels no section found, using default: output_taxa_groupby
Labels no section found, using default: output_read_count
UniVec_Core found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/UniVec/UniVec_Core.fasta
Adapter found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/Trimmomatic/Nextseq_PE.fa
Host found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/host-sequence/Mus_musculus.GRCm39.cds.all.fa
Rfam found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/Rfam/Rfam.cm
DNA_DB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/choco_h3_genus/genus_group
source_taxa_db found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/choco_h3_genus/genus_group\
Prot_DB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr/nr
Prot_DB_reads found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr/nr
accession2taxid found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/accession2taxid
nodes found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/WEVOTE_db/nodes_wevote.dmp
names found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/WEVOTE_db/names_wevote.dmp
Kaiju_db found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/kaiju_db/kaiju_db_nr.fmi
Centrifuge_db found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/centrifuge/nt
SWISS_PROT found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/swiss_prot_db
SWISS_PROT_map found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/swiss_prot_db/SwissProt_EC_Mapping.tsv
PriamDB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/PRIAM_db
DetectDB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/DETECTv2
WEVOTEDB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/WEVOTE_db/
EC_pathway found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/EC_pathway/EC_pathway.txt
path_to_superpath found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/pathway_to_superpathway.csv
MetaGeneMark_model found! using: /pipeline_tools/mgm/MetaGeneMark_v1.mod
enzyme_db no inner section found. using default /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt
taxid_tree found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/taxid_trees/genus_tree.tsv
kraken2_db found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/kraken2_db
dir name: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr
file name: nr
/arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr/nr
2024-11-10 15:47:19.321745 /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr/nr exists
2024-11-10 15:47:19.321780 DMD index is ok
Python found! using: python3
Java found! using: java -jar
cdhit_dup found! using: /pipeline_tools/cdhit_dup/cd-hit-dup
AdapterRemoval found! using: /pipeline_tools/adapterremoval/AdapterRemoval
vsearch found! using: /pipeline_tools/vsearch/vsearch
BWA found! using: /pipeline_tools/BWA/bwa
SAMTOOLS found! using: /pipeline_tools/samtools/samtools
BLAT found! using: /pipeline_tools/PBLAT/pblat
DIAMOND found! using: /pipeline_tools/DIAMOND/diamond
Blastp found! using: /pipeline_tools/BLAST_p/blastp
Needle found! using: /pipeline_tools/EMBOSS-6.6.0/emboss/needle
Makeblastdb found! using: /pipeline_tools/BLAST_p/makeblastdb
Barrnap found! using: /pipeline_tools/Barrnap/bin/barrnap
Infernal found! using: /pipeline_tools/infernal/cmscan
Kaiju found! using: /pipeline_tools/kaiju/kaiju
Centrifuge found! using: /pipeline_tools/centrifuge/centrifuge
Priam found! using: /pipeline_tools/PRIAM_search/PRIAM_search.jar
Detect found! using: /pipeline/Scripts/Detect_2.2.9.py
BLAST_dir found! using: /pipeline_tools/BLAST_p
WEVOTE found! using: /pipeline_tools/WEVOTE/WEVOTE
Spades found! using: /pipeline_tools/SPAdes/bin/spades.py
MetaGeneMark found! using: /pipeline_tools/mgm/gmhmmp
kraken2 no inner section found. using default /pipeline_tools/kraken2/kraken2
code no section found, using default: /pipeline/Scripts/read_sam.py
code no section found, using default: /pipeline/Scripts/read_sort.py
code no section found, using default: /pipeline/Scripts/read_repopulation.py
code no section found, using default: /pipeline/Scripts/read_orphan.py
code no section found, using default: /pipeline/Scripts/read_remove_tag.py
code no section found, using default: /pipeline/Scripts/read_BLAT_filter_v3.py
code no section found, using default: /pipeline/Scripts/read_split.py
code no section found, using default: /pipeline/Scripts/read_rRNA_barrnap.py
code no section found, using default: /pipeline/Scripts/read_rRNA_infernal.py
code no section found, using default: /pipeline/Scripts/assembly_make_contig_map.py
code no section found, using default: /pipeline/Scripts/assembly_flush_bad_contigs.py
code no section found, using default: /pipeline/Scripts/assembly_deduplicate.py
code no section found, using default: /pipeline/Scripts/ga_BWA_generic_v2.py
code no section found, using default: /pipeline/Scripts/ga_BLAT_generic_v3.py
code no section found, using default: /pipeline/Scripts/ga_Diamond_generic_v2.py
code no section found, using default: /pipeline/Scripts/ga_Final_merge_v4.py
code no section found, using default: /pipeline/Scripts/ga_merge_fasta.py
code no section found, using default: /pipeline/Scripts/ga_final_merge_fastq.py
code no section found, using default: /pipeline/Scripts/ga_final_merge_proteins.py
code no section found, using default: /pipeline/Scripts/ga_final_merge_map.py
code no section found, using default: /pipeline/Scripts/ea_combine_v5.py
code no section found, using default: /pipeline/Scripts/ta_taxid_v3.py
code no section found, using default: /pipeline/Scripts/ta_constrain_taxonomy_v2.py
code no section found, using default: /pipeline/Scripts/ta_combine_v3.py
code no section found, using default: /pipeline/Scripts/ta_wevote_parser.py
code no section found, using default: /pipeline/Scripts/output_taxa_groupby.py
code no section found, using default: /pipeline/Scripts/output_table_v3.py
code no section found, using default: /pipeline/Scripts/output_reformat_rpkm_table.py
code no section found, using default: /pipeline/Scripts/output_read_counts_v2.py
code no section found, using default: /pipeline/Scripts/output_read_quality_metrics.py
code no section found, using default: /pipeline/Scripts/output_contig_stats.py
code no section found, using default: /pipeline/Scripts/output_EC_metrics.py
code no section found, using default: /pipeline/Scripts/output_data_change_metrics.py
code no section found, using default: /pipeline/Scripts/output_get_host_reads.py
code no section found, using default: /pipeline/Scripts/remove_gaps_in_fasta.py
code no section found, using default: /pipeline/Scripts/output_parse_sam.py
code no section found, using default: /pipeline/Scripts/output_are_you_in_a_contig.py
code no section found, using default: /pipeline/Scripts/output_convert_gene_map_contig_segments.py
code no section found, using default: /pipeline/Scripts/output_filter_taxa.py
code no section found, using default: /pipeline/Scripts/output_filter_ECs.py
code no section found, using default: /pipeline/Scripts/bwa_read_sorter.py
code no section found, using default: /pipeline/Scripts/ta_contig_name_convert.py
code no section found, using default: /pipeline/Scripts/ga_pre_scan_get_libs.py
code no section found, using default: /pipeline/Scripts/ga_pre_scan_assemble_libs.py
MetaPro operating in auto-mode
Forward Reads: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/data/fwd/1_iMUDI001_S1_L007_R1_001.fastq.gz
Reverse Reads: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/data/rev/1_iMUDI001_S1_L007_R2_001.fastq.gz
Output filepath: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output
job path: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output/quality_filter
2024-11-10 15:47:19.324491 bypassing: quality_filter
2024-11-10 15:47:19.324506 skipping job: quality_filter
quality filter: 0.0 s
quality filter cleanup: 0.0 s
2024-11-10 15:47:19.324522 continuing from: quality_filter
job path: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output/host_filter
2024-11-10 15:47:19.325503 bypassing: host_filter
2024-11-10 15:47:19.325515 skipping job: host_filter
host filter: 0.0 s
host filter cleanup: 0.0 s
2024-11-10 15:47:19.325526 continuing from: host_filter
job path: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output/vector_filter
2024-11-10 15:47:19.326474 bypassing: vector_filter
2024-11-10 15:47:19.326485 skipping job: vector_filter
vector filter: 0.0 s
vector filter cleanup: 0.0 s
2024-11-10 15:47:19.326497 continuing from: vector_filter
2024-11-10 15:47:19.326820 bypassing: rRNA_filter
rRNA filter: 0.0 s
rRNA filter cleanup: 0.0 s
2024-11-10 15:47:19.326839 continuing from: rRNA_filter
2024-11-10 15:47:19.327151 bypassing: duplicate_repopulation
repop: 0.0 s
repop cleanup: 0.0 s
2024-11-10 15:47:19.327169 continuing from: duplicate_repopulation
2024-11-10 15:47:19.327543 bypassing: assemble_contigs
2024-11-10 15:47:19.328153 MGM OK. contigs present
assemble contigs: 0.0 s
assemble contigs cleanup: 0.0 s
2024-11-10 15:47:19.328167 continuing from: assemble_contigs
2024-11-10 15:47:19.328542 bypassing: GA_pre_scan
2024-11-10 15:47:19.328553 continuing from: GA_pre_scan
2024-11-10 15:47:19.328874 running: GA_split
2024-11-10 15:47:19.328884 splitting contigs
splitting fasta for contigs
splitting fastq for singletons GA
splitting fastq for pair_1 GA
splitting fastq for pair_2 GA
2024-11-10 15:47:19.346562 closing down processes: 4
2024-11-10 15:47:19.346613 closed down: 0/4
2024-11-10 15:47:38.511107 closed down: 1/4
2024-11-10 15:47:38.511205 closed down: 2/4
2024-11-10 15:47:38.511229 closed down: 3/4
2024-11-10 15:47:40.516806 continuing from: GA_split
2024-11-10 15:47:40.516834 Running GA lib check
2024-11-10 15:47:40.516862 BWA DB check: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output/GA_pre_scan/final_results
2024-11-10 15:47:40.524891 Error: no fasta files found. BWA only accepts .fasta extensions
empty BWA database
Any guidance is appreciated. Thanks!
I have some issues with completing the pipeline. The step GA_pre_scan does not produce any files in the final_resultss folder and then the pipeline stops in the GA_split step. can you please help me to identify the error?
Here below the output.
2024-04-18 09:01:21.514892 continuing from: assemble_contigs 2024-04-18 09:01:21.518869 running: GA_pre_scan 2024-04-18 09:01:21.548006 mp_ta_kraken2_singletons job submitted. mem: 375.48778515625 GB^M2024-04-18 09:01:21.560298 mp_ta_kraken2_paired job submitted. mem: 375.4842890625 GB^MKraken2 on singletons Kraken2 on paired 2024-04-18 09:01:21.573631 mp_ta_kraken2_contigs job submitted. mem: 375.4856484375 GB^MGA_pre_scan/data/jobs/mp_ta_centrifuge_reads Kraken2 on contigs 2024-04-18 09:01:21.600288 mp_ta_centrifuge_reads job submitted. mem: 375.482765625 GB^MGA_pre_scan/data/jobs/mp_ta_centrifuge_contigs centrifuge on reads Loading database information...Loading database information...Loading database information...centrifuge on contigs done. done. done. 15475 sequences (12.36 Mbp) processed in 0.600s (1547.2 Kseq/m, 1235.55 Mbp/m). 15377 sequences classified (99.37%) 98 sequences unclassified (0.63%) 41252 sequences (12.02 Mbp) processed in 0.774s (3195.9 Kseq/m, 931.33 Mbp/m). 40639 sequences classified (98.51%) 613 sequences unclassified (1.49%) 677476 sequences (78.17 Mbp) processed in 0.870s (46697.5 Kseq/m, 5388.09 Mbp/m). 628361 sequences classified (92.75%) 49115 sequences unclassified (7.25%) report file /scratch/t0065634/Microbiome/output_batch2/LPC0010_S8/GA_pre_scan/data/2_centrifuge/raw_contigs.txt Number of iterations in EM algorithm: 4 Probability diff. (P - P_prev) in the last iteration: 3.70532e-11 Calculating abundance: 00:00:00 report file /scratch/t0065634/Microbiome/output_batch2/LPC0010_S8/GA_pre_scan/data/2_centrifuge/reads.txt Number of iterations in EM algorithm: 13 Probability diff. (P - P_prev) in the last iteration: 8.45475e-11 Calculating abundance: 00:00:00 2024-04-18 09:01:21.618062 mp_ta_centrifuge_contigs job submitted. mem: 375.47983984375 GB^M2024-04-18 09:01:21.619364 closing down processes: 5 2024-04-18 09:01:21.619401 closed down: 0/5 ^M2024-04-18 09:03:09.809845 closed down: 1/5 ^M2024-04-18 09:03:09.809963 closed down: 2/5 ^M2024-04-18 09:03:09.810030 closed down: 3/5 ^M2024-04-18 09:13:37.616210 closed down: 4/5 ^Mmerging kraken2 reports 2024-04-18 09:13:37.622425 TA_kraken2_pp job submitted. mem: 375.4827734375 GB^M2024-04-18 09:13:37.623675 closing down processes: 1 2024-04-18 09:13:37.623712 closed down: 0/1 ^Mcombining all centrifuge results 2024-04-18 09:13:37.938608 TA_centrifuge_pp job submitted. mem: 375.48255078125 GB^M2024-04-18 09:13:37.940008 closing down processes: 1 2024-04-18 09:13:37.940046 closed down: 0/1 ^Mcombining classification outputs for wevote Running WEVOTE gathering WEVOTE results 2024-04-18 09:13:38.094341 TA_wevote_combine job submitted. mem: 375.48346484375 GB^M2024-04-18 09:13:38.095641 running: TA_wevote_combine 2024-04-18 09:13:38.095690 closing down processes: 1 2024-04-18 09:13:38.095718 closed down: 0/1 ^MGA pre-scan get libs 2024-04-18 09:15:58.784956 ga_collect_db job submitted. mem: 375.4834921875 GB^M2024-04-18 09:15:58.786435 running: ga_collect_db 2024-04-18 09:15:58.786477 closing down processes: 1 2024-04-18 09:15:58.786506 closed down: 0/1 ^MGA assemble libs 2024-04-18 09:16:08.826043 ga_assemble_db job submitted. mem: 375.48344140625 GB^M2024-04-18 09:16:08.827014 running: ga_assemble_db 2024-04-18 09:16:08.827046 closing down processes: 1 2024-04-18 09:16:08.827063 closed down: 0/1 ^M2024-04-18 09:16:08.934087 continuing from: GA_pre_scan 2024-04-18 09:16:08.938664 running: GA_split 2024-04-18 09:16:08.938700 splitting contigs splitting fasta for contigs splitting fastq for singletons GA splitting fastq for pair_1 GA splitting fastq for pair_2 GA 2024-04-18 09:16:09.008651 closing down processes: 4 2024-04-18 09:16:09.008748 closed down: 0/4 ^M2024-04-18 09:16:11.656524 closed down: 1/4 ^M2024-04-18 09:16:11.656631 closed down: 2/4 ^M2024-04-18 09:16:11.656673 closed down: 3/4 ^M2024-04-18 09:16:13.681369 continuing from: GA_split 2024-04-18 09:16:13.681450 Running GA lib check 2024-04-18 09:16:13.681531 BWA DB check: /scratch/t0065634/Microbiome/output_batch2//LPC0010_S8/GA_pre_scan/final_results 2024-04-18 09:16:13.686604 Error: no fasta files found. BWA only accepts .fasta extensions empty BWA database