GA_pre_scan results folder empty

tkcaccia commented 7 months ago

I have some issues with completing the pipeline. The step GA_pre_scan does not produce any files in the final_resultss folder and then the pipeline stops in the GA_split step. can you please help me to identify the error?

Here below the output.

2024-04-18 09:01:21.514892 continuing from: assemble_contigs 2024-04-18 09:01:21.518869 running: GA_pre_scan 2024-04-18 09:01:21.548006 mp_ta_kraken2_singletons job submitted. mem: 375.48778515625 GB^M2024-04-18 09:01:21.560298 mp_ta_kraken2_paired job submitted. mem: 375.4842890625 GB^MKraken2 on singletons Kraken2 on paired 2024-04-18 09:01:21.573631 mp_ta_kraken2_contigs job submitted. mem: 375.4856484375 GB^MGA_pre_scan/data/jobs/mp_ta_centrifuge_reads Kraken2 on contigs 2024-04-18 09:01:21.600288 mp_ta_centrifuge_reads job submitted. mem: 375.482765625 GB^MGA_pre_scan/data/jobs/mp_ta_centrifuge_contigs centrifuge on reads Loading database information...Loading database information...Loading database information...centrifuge on contigs done. done. done. 15475 sequences (12.36 Mbp) processed in 0.600s (1547.2 Kseq/m, 1235.55 Mbp/m). 15377 sequences classified (99.37%) 98 sequences unclassified (0.63%) 41252 sequences (12.02 Mbp) processed in 0.774s (3195.9 Kseq/m, 931.33 Mbp/m). 40639 sequences classified (98.51%) 613 sequences unclassified (1.49%) 677476 sequences (78.17 Mbp) processed in 0.870s (46697.5 Kseq/m, 5388.09 Mbp/m). 628361 sequences classified (92.75%) 49115 sequences unclassified (7.25%) report file /scratch/t0065634/Microbiome/output_batch2/LPC0010_S8/GA_pre_scan/data/2_centrifuge/raw_contigs.txt Number of iterations in EM algorithm: 4 Probability diff. (P - P_prev) in the last iteration: 3.70532e-11 Calculating abundance: 00:00:00 report file /scratch/t0065634/Microbiome/output_batch2/LPC0010_S8/GA_pre_scan/data/2_centrifuge/reads.txt Number of iterations in EM algorithm: 13 Probability diff. (P - P_prev) in the last iteration: 8.45475e-11 Calculating abundance: 00:00:00 2024-04-18 09:01:21.618062 mp_ta_centrifuge_contigs job submitted. mem: 375.47983984375 GB^M2024-04-18 09:01:21.619364 closing down processes: 5 2024-04-18 09:01:21.619401 closed down: 0/5 ^M2024-04-18 09:03:09.809845 closed down: 1/5 ^M2024-04-18 09:03:09.809963 closed down: 2/5 ^M2024-04-18 09:03:09.810030 closed down: 3/5 ^M2024-04-18 09:13:37.616210 closed down: 4/5 ^Mmerging kraken2 reports 2024-04-18 09:13:37.622425 TA_kraken2_pp job submitted. mem: 375.4827734375 GB^M2024-04-18 09:13:37.623675 closing down processes: 1 2024-04-18 09:13:37.623712 closed down: 0/1 ^Mcombining all centrifuge results 2024-04-18 09:13:37.938608 TA_centrifuge_pp job submitted. mem: 375.48255078125 GB^M2024-04-18 09:13:37.940008 closing down processes: 1 2024-04-18 09:13:37.940046 closed down: 0/1 ^Mcombining classification outputs for wevote Running WEVOTE gathering WEVOTE results 2024-04-18 09:13:38.094341 TA_wevote_combine job submitted. mem: 375.48346484375 GB^M2024-04-18 09:13:38.095641 running: TA_wevote_combine 2024-04-18 09:13:38.095690 closing down processes: 1 2024-04-18 09:13:38.095718 closed down: 0/1 ^MGA pre-scan get libs 2024-04-18 09:15:58.784956 ga_collect_db job submitted. mem: 375.4834921875 GB^M2024-04-18 09:15:58.786435 running: ga_collect_db 2024-04-18 09:15:58.786477 closing down processes: 1 2024-04-18 09:15:58.786506 closed down: 0/1 ^MGA assemble libs 2024-04-18 09:16:08.826043 ga_assemble_db job submitted. mem: 375.48344140625 GB^M2024-04-18 09:16:08.827014 running: ga_assemble_db 2024-04-18 09:16:08.827046 closing down processes: 1 2024-04-18 09:16:08.827063 closed down: 0/1 ^M2024-04-18 09:16:08.934087 continuing from: GA_pre_scan 2024-04-18 09:16:08.938664 running: GA_split 2024-04-18 09:16:08.938700 splitting contigs splitting fasta for contigs splitting fastq for singletons GA splitting fastq for pair_1 GA splitting fastq for pair_2 GA 2024-04-18 09:16:09.008651 closing down processes: 4 2024-04-18 09:16:09.008748 closed down: 0/4 ^M2024-04-18 09:16:11.656524 closed down: 1/4 ^M2024-04-18 09:16:11.656631 closed down: 2/4 ^M2024-04-18 09:16:11.656673 closed down: 3/4 ^M2024-04-18 09:16:13.681369 continuing from: GA_split 2024-04-18 09:16:13.681450 Running GA lib check 2024-04-18 09:16:13.681531 BWA DB check: /scratch/t0065634/Microbiome/output_batch2//LPC0010_S8/GA_pre_scan/final_results 2024-04-18 09:16:13.686604 Error: no fasta files found. BWA only accepts .fasta extensions empty BWA database

billytaj commented 7 months ago

what does your config look like? are all databases downloaded?

tkcaccia commented 7 months ago

I realized the script lib_downloader.py did not download all libraries. So I downloaded again the missing one. The output showed that all libraries were found: UniVec_Core found! using: /scratch/alphafold/MetaPro/univec_core/UniVec_Core.fasta Adapter found! using: /scratch/alphafold/MetaPro/trimmomatic_adapters/TruSeq3-PE-2.fa Host found! using: /scratch/alphafold/MetaPro/human_genome/human_genome.fasta Rfam found! using: /scratch/alphafold/MetaPro/Rfam/Rfam.cm DNA_DB found! using: /scratch/alphafold/MetaPro/family_group source_taxa_db no inner section found. using default /project/j/jparkin/Lab_Databases/family_llbs Prot_DB found! using: /scratch/alphafold/MetaPro/nr/nr Prot_DB_reads found! using: /scratch/alphafold/MetaPro/nr/nr accession2taxid found! using: /scratch/alphafold/MetaPro/accession2taxid/accession2taxid nodes found! using: /scratch/alphafold/MetaPro/WEVOTE_db/nodes_wevote.dmp names found! using: /scratch/alphafold/MetaPro/WEVOTE_db/names_wevote.dmp Kaiju_db found! using: /scratch/alphafold/MetaPro/kaiju_db/kaiju_db_nr.fmi Centrifuge_db found! using: /scratch/alphafold/MetaPro/centrifuge_db/nt SWISS_PROT found! using: /scratch/alphafold/MetaPro/swiss_prot_db/swiss_prot_db SWISS_PROT_map found! using: /scratch/alphafold/MetaPro/swiss_prot_db/SwissProt_EC_Mapping.tsv PriamDB found! using: /scratch/alphafold/MetaPro/PRIAM_db/ DetectDB found! using: /scratch/alphafold/MetaPro/DETECTv2 WEVOTEDB found! using: /scratch/alphafold/MetaPro/WEVOTE_db/ EC_pathway found! using: /scratch/alphafold/MetaPro/EC_pathway/EC_pathway.txt path_to_superpath found! using: /scratch/alphafold/MetaPro/path_to_superpath/pathway_to_superpathway.csv MetaGeneMark_model found! using: /pipeline_tools/mgm/MetaGeneMark_v1.mod enzyme_db no inner section found. using default /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt taxid_tree found! using: /scratch/alphafold/MetaPro/taxid_trees/class_tree.tsv kraken2_db found! using: /scratch/alphafold/MetaPro/kraken2_db

The pipeline stopped at GA_split but I noted the results folder was empty in GA_pre_scan, so I manually removed these folders and remove GA_split and GA_pre_scan from bypass_log.txt

How can I identify where is the problem?

billytaj commented 7 months ago

if you need to dive into the code, all steps create a shellscript for their specific section. you could run the shellscript for that step manually to see where the system is stalling.

tkcaccia commented 7 months ago

The script does not stall. No FASTA files are produced in GA_pre_scan

billytaj commented 7 months ago

so, the config says it can't find your source taxa db. GA_pre_scan relies on these taxid trees we made: https://compsysbio.org/metapro_libs/taxid_trees/ These trees link every taxa found in chocophlan to their higher-order rollups.

Your run is missing these tables.

Gabe-BioUSD commented 5 months ago

Hi billytaj, I am having the same issue. First, I was having only the class_tsv, but from your reply to the above I get the other tax tree files. However, the pipeline still ended with the error ~/Outs/GA_pre_scan/final_results 2024-06-18 04:50:47.953054 Error: no fasta files found. BWA only accepts .fasta extensions empty BWA database. tkcaccia, did you resolve the problem? Thanks

billytaj commented 4 months ago

this error is a warning that the pre-scan didn't function properly.
it's supposed to taxa-scan your cleaned reads and populate a customized subset of the chocophlan database. There's ways to bypass it if you want.

Gabe-BioUSD commented 4 months ago

Could you point to how we can bypassed that's Thank

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: Billy Taj @.> Sent: Wednesday, June 26, 2024 11:31:59 AM To: ParkinsonLab/MetaPro @.> Cc: Agany, Diing @.>; Comment @.> Subject: Re: [ParkinsonLab/MetaPro] GA_pre_scan results folder empty (Issue #24)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

this error is a warning that the pre-scan didn't function properly. it's supposed to taxa-scan your cleaned reads and populate a customized subset of the chocophlan database. There's ways to bypass it if you want.

— Reply to this email directly, view it on GitHubhttps://github.com/ParkinsonLab/MetaPro/issues/24#issuecomment-2192143934, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATSTELNB6W7HOMFRGNUQ2Q3ZJLUH7AVCNFSM6AAAAABGNCYCTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJSGE2DGOJTGQ. You are receiving this because you commented.Message ID: @.***>

billytaj commented 4 months ago

in your config, under the Databases heading, Add in DNA_DB_override = True

irvinng98 commented 1 week ago

Hi Billy,

I'm running into the same issue and DNA_DB_override = True did not work.

Here is my config:

[Databases]
DNA_DB_override = True
database_path: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases
UniVec_Core: %(database_path)s/UniVec/UniVec_Core.fasta
Adapter: %(database_path)s/Trimmomatic/Nextseq_PE.fa
Host: %(database_path)s/host-sequence/Mus_musculus.GRCm39.cds.all.fa
Rfam: %(database_path)s/prebuilt/Rfam/Rfam.cm
DNA_DB: %(database_path)s/prebuilt/choco_h3_genus/genus_group
source_taxa_db: %(database_path)s/prebuilt/choco_h3_genus/genus_group\
#YOU NEED THE ABOVE LINE
#DNA_DB: %(database_path)s/chocophlan_h3_chunks
#DNA_DB: /home/billy/choco_h3/choco_h3_group
#DNA_DB_Split: %(database_path)s/ChocoPhlAn/ChocoPhlAn_split/
Prot_DB: %(database_path)s/prebuilt/nr/nr
Prot_DB_reads: %(database_path)s/prebuilt/nr/nr
accession2taxid: %(database_path)s/accession2taxid
nodes: %(database_path)s/prebuilt/WEVOTE_db/nodes_wevote.dmp
names: %(database_path)s/prebuilt/WEVOTE_db/names_wevote.dmp
Kaiju_db: %(database_path)s/prebuilt/kaiju_db/kaiju_db_nr.fmi
Centrifuge_db: %(database_path)s/centrifuge/nt
SWISS_PROT: %(database_path)s/prebuilt/swiss_prot_db
SWISS_PROT_map: %(database_path)s/prebuilt/swiss_prot_db/SwissProt_EC_Mapping.tsv
PriamDB: %(database_path)s/prebuilt/PRIAM_db
DetectDB: %(database_path)s/prebuilt/DETECTv2
WEVOTEDB: %(database_path)s/prebuilt/WEVOTE_db/
EC_pathway: %(database_path)s/prebuilt/EC_pathway/EC_pathway.txt
path_to_superpath: %(database_path)s/prebuilt/pathway_to_superpathway.csv
MetaGeneMark_model: /pipeline_tools/mgm/MetaGeneMark_v1.mod
taxid_tree: %(database_path)s/prebuilt/taxid_trees/genus_tree.tsv
kraken2_db: %(database_path)s/prebuilt/kraken2_db
#[code]
#ga_pre_scan_get_libs = /home/billy/human_flu/30785/ga_pre_scan_get_libs.py

This is the output:

USING CONFIG /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/config.ini
target_rank no inner section found. using default genus
AdapterRemoval_minlength found! using: 30
Show_unclassified found! using: Yes
bypass_log_name no inner section found. using default bypass_log.txt
debug_stop_flag no inner section found. using default none
num_threads no inner section found. using default 32
taxa_existence_cutoff no inner section found. using default 0.1
DNA_DB_mode no inner section found. using default chocophlan
RPKM_cutoff found! using: 0.005
BWA_cigar_cutoff found! using: 90
BLAT_identity_cutoff found! using: 75
BLAT_length_cutoff found! using: 0.5
BLAT_score_cutoff found! using: 60
DIAMOND_identity_cutoff found! using: 75
DIAMOND_length_cutoff found! using: 0.5
DIAMOND_score_cutoff found! using: 60
BWA_mem_footprint no inner section found. using default 5
BLAT_mem_footprint no inner section found. using default 5
DMD_mem_footprint no inner section found. using default 10
BWA_mem_threshold found! using: 75
BLAT_mem_threshold found! using: 75
DIAMOND_mem_threshold found! using: 80
DETECT_mem_threshold found! using: 80
Infernal_mem_threshold found! using: 75
Barrnap_mem_threshold found! using: 75
BWA_pp_mem_threshold found! using: 30
BLAT_pp_mem_threshold found! using: 75
DIAMOND_pp_mem_threshold found! using: 80
GA_final_merge_mem_threshold no inner section found. using default 5
TA_mem_threshold found! using: 80
repop_mem_threshold no inner section found. using default 50
EC_mem_threshold no inner section found. using default 5
BWA_job_limit found! using: 32
BLAT_job_limit found! using: 32
DIAMOND_job_limit found! using: 32
DETECT_job_limit found! using: 32
Infernal_job_limit found! using: 32
Barrnap_job_limit found! using: 32
BWA_pp_job_limit found! using: 32
BLAT_pp_job_limit found! using: 32
DIAMOND_pp_job_limit found! using: 32
GA_final_merge_job_limit no inner section found. using default 24
TA_job_limit no inner section found. using default 24
repop_job_limit no inner section found. using default 1
EC_job_limit no inner section found. using default 24
Infernal_job_delay no inner section found. using default 5
Barrnap_job_delay no inner section found. using default 5
BWA_job_delay found! using: 0.5
BLAT_job_delay found! using: 5
DIAMOND_job_delay found! using: 5
DETECT_job_delay no inner section found. using default 5
BWA_pp_job_delay found! using: 0.01
BLAT_pp_job_delay found! using: 0.05
DIAMOND_pp_job_delay found! using: 5
GA_final_merge_job_delay no inner section found. using default 5
TA_job_delay found! using: 10
repop_job_delay no inner section found. using default 10
EC_job_delay no inner section found. using default 1
keep_all found! using: yes
keep_quality found! using: no
keep_host found! using: no
keep_vector found! using: no
keep_rRNA found! using: no
keep_repop found! using: no
keep_assemble_contigs found! using: yes
keep_GA_BWA found! using: no
keep_GA_BLAT found! using: no
keep_GA_DIAMOND found! using: no
keep_GA_final found! using: no
keep_TA found! using: no
keep_EC found! using: no
keep_outputs found! using: no
filter_stringency found! using: high
GA_chunk_size found! using: 10000
EC_chunk_size found! using: 1000
rRNA_chunk_size found! using: 50000
Labels no section found, using default: quality_filter
Labels no section found, using default: host_filter
Labels no section found, using default: vector_filter
Labels no section found, using default: rRNA_filter
Labels no section found, using default: rRNA_filter_split
Labels no section found, using default: rRNA_filter_convert
Labels no section found, using default: rRNA_filter_barrnap
Labels no section found, using default: rRNA_filter_barrnap_merge
Labels no section found, using default: rRNA_filter_barrnap_pp
Labels no section found, using default: rRNA_filter_infernal
Labels no section found, using default: rRNA_filter_infernal_prep
Labels no section found, using default: rRNA_filter_splitter
Labels no section found, using default: rRNA_filter_post
Labels no section found, using default: duplicate_repopulation
Labels no section found, using default: assemble_contigs
Labels no section found, using default: destroy_contigs
Labels no section found, using default: GA_pre_scan
Labels no section found, using default: GA_split
Labels no section found, using default: GA_BWA
Labels no section found, using default: GA_BWA_pp
Labels no section found, using default: GA_BWA_merge
Labels no section found, using default: GA_BLAT
Labels no section found, using default: GA_BLAT_cleanup
Labels no section found, using default: GA_BLAT_cat
Labels no section found, using default: GA_BLAT_pp
Labels no section found, using default: GA_BLAT_merge
Labels no section found, using default: GA_DMD
Labels no section found, using default: GA_DMD_pp
Labels no section found, using default: GA_final_merge
Labels no section found, using default: taxonomic_annotation
Labels no section found, using default: enzyme_annotation
Labels no section found, using default: enzyme_annotation_detect
Labels no section found, using default: enzyme_annotation_priam
Labels no section found, using default: enzyme_annotation_priam_split
Labels no section found, using default: enzyme_annotation_priam_cat
Labels no section found, using default: enzyme_annotation_DMD
Labels no section found, using default: enzyme_annotation_pp
Labels no section found, using default: outputs
Labels no section found, using default: output_copy_gene_map
Labels no section found, using default: output_clean_ec
Labels no section found, using default: output_copy_taxa
Labels no section found, using default: output_network_generation
Labels no section found, using default: output_unique_hosts_singletons
Labels no section found, using default: output_unique_hosts_pair_1
Labels no section found, using default: output_unique_hosts_pair_2
Labels no section found, using default: output_unique_vectors_singletons
Labels no section found, using default: output_unique_vectors_pair_1
Labels no section found, using default: output_unique_vectors_pair_2
Labels no section found, using default: output_combine_hosts
Labels no section found, using default: output_per_read_scores
Labels no section found, using default: output_contig_stats
Labels no section found, using default: output_ec_heatmap
Labels no section found, using default: output_taxa_groupby
Labels no section found, using default: output_read_count
UniVec_Core found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/UniVec/UniVec_Core.fasta
Adapter found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/Trimmomatic/Nextseq_PE.fa
Host found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/host-sequence/Mus_musculus.GRCm39.cds.all.fa
Rfam found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/Rfam/Rfam.cm
DNA_DB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/choco_h3_genus/genus_group
source_taxa_db found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/choco_h3_genus/genus_group\
Prot_DB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr/nr
Prot_DB_reads found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr/nr
accession2taxid found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/accession2taxid
nodes found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/WEVOTE_db/nodes_wevote.dmp
names found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/WEVOTE_db/names_wevote.dmp
Kaiju_db found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/kaiju_db/kaiju_db_nr.fmi
Centrifuge_db found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/centrifuge/nt
SWISS_PROT found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/swiss_prot_db
SWISS_PROT_map found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/swiss_prot_db/SwissProt_EC_Mapping.tsv
PriamDB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/PRIAM_db
DetectDB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/DETECTv2
WEVOTEDB found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/WEVOTE_db/
EC_pathway found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/EC_pathway/EC_pathway.txt
path_to_superpath found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/pathway_to_superpathway.csv
MetaGeneMark_model found! using: /pipeline_tools/mgm/MetaGeneMark_v1.mod
enzyme_db no inner section found. using default /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt
taxid_tree found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/taxid_trees/genus_tree.tsv
kraken2_db found! using: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/kraken2_db
dir name: /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr
file name: nr
/arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr/nr
2024-11-10 15:47:19.321745 /arc/project/st-vbruce-1/irvinng/G4H-Metatranscriptomics/databases/prebuilt/nr/nr exists
2024-11-10 15:47:19.321780 DMD index is ok
Python found! using: python3
Java found! using: java -jar
cdhit_dup found! using: /pipeline_tools/cdhit_dup/cd-hit-dup
AdapterRemoval found! using: /pipeline_tools/adapterremoval/AdapterRemoval
vsearch found! using: /pipeline_tools/vsearch/vsearch
BWA found! using: /pipeline_tools/BWA/bwa
SAMTOOLS found! using: /pipeline_tools/samtools/samtools
BLAT found! using: /pipeline_tools/PBLAT/pblat
DIAMOND found! using: /pipeline_tools/DIAMOND/diamond
Blastp found! using: /pipeline_tools/BLAST_p/blastp
Needle found! using: /pipeline_tools/EMBOSS-6.6.0/emboss/needle
Makeblastdb found! using: /pipeline_tools/BLAST_p/makeblastdb
Barrnap found! using: /pipeline_tools/Barrnap/bin/barrnap
Infernal found! using: /pipeline_tools/infernal/cmscan
Kaiju found! using: /pipeline_tools/kaiju/kaiju
Centrifuge found! using: /pipeline_tools/centrifuge/centrifuge
Priam found! using: /pipeline_tools/PRIAM_search/PRIAM_search.jar
Detect found! using: /pipeline/Scripts/Detect_2.2.9.py
BLAST_dir found! using: /pipeline_tools/BLAST_p
WEVOTE found! using: /pipeline_tools/WEVOTE/WEVOTE
Spades found! using: /pipeline_tools/SPAdes/bin/spades.py
MetaGeneMark found! using: /pipeline_tools/mgm/gmhmmp
kraken2 no inner section found. using default /pipeline_tools/kraken2/kraken2
code no section found, using default: /pipeline/Scripts/read_sam.py
code no section found, using default: /pipeline/Scripts/read_sort.py
code no section found, using default: /pipeline/Scripts/read_repopulation.py
code no section found, using default: /pipeline/Scripts/read_orphan.py
code no section found, using default: /pipeline/Scripts/read_remove_tag.py
code no section found, using default: /pipeline/Scripts/read_BLAT_filter_v3.py
code no section found, using default: /pipeline/Scripts/read_split.py
code no section found, using default: /pipeline/Scripts/read_rRNA_barrnap.py
code no section found, using default: /pipeline/Scripts/read_rRNA_infernal.py
code no section found, using default: /pipeline/Scripts/assembly_make_contig_map.py
code no section found, using default: /pipeline/Scripts/assembly_flush_bad_contigs.py
code no section found, using default: /pipeline/Scripts/assembly_deduplicate.py
code no section found, using default: /pipeline/Scripts/ga_BWA_generic_v2.py
code no section found, using default: /pipeline/Scripts/ga_BLAT_generic_v3.py
code no section found, using default: /pipeline/Scripts/ga_Diamond_generic_v2.py
code no section found, using default: /pipeline/Scripts/ga_Final_merge_v4.py
code no section found, using default: /pipeline/Scripts/ga_merge_fasta.py
code no section found, using default: /pipeline/Scripts/ga_final_merge_fastq.py
code no section found, using default: /pipeline/Scripts/ga_final_merge_proteins.py
code no section found, using default: /pipeline/Scripts/ga_final_merge_map.py
code no section found, using default: /pipeline/Scripts/ea_combine_v5.py
code no section found, using default: /pipeline/Scripts/ta_taxid_v3.py
code no section found, using default: /pipeline/Scripts/ta_constrain_taxonomy_v2.py
code no section found, using default: /pipeline/Scripts/ta_combine_v3.py
code no section found, using default: /pipeline/Scripts/ta_wevote_parser.py
code no section found, using default: /pipeline/Scripts/output_taxa_groupby.py
code no section found, using default: /pipeline/Scripts/output_table_v3.py
code no section found, using default: /pipeline/Scripts/output_reformat_rpkm_table.py
code no section found, using default: /pipeline/Scripts/output_read_counts_v2.py
code no section found, using default: /pipeline/Scripts/output_read_quality_metrics.py
code no section found, using default: /pipeline/Scripts/output_contig_stats.py
code no section found, using default: /pipeline/Scripts/output_EC_metrics.py
code no section found, using default: /pipeline/Scripts/output_data_change_metrics.py
code no section found, using default: /pipeline/Scripts/output_get_host_reads.py
code no section found, using default: /pipeline/Scripts/remove_gaps_in_fasta.py
code no section found, using default: /pipeline/Scripts/output_parse_sam.py
code no section found, using default: /pipeline/Scripts/output_are_you_in_a_contig.py
code no section found, using default: /pipeline/Scripts/output_convert_gene_map_contig_segments.py
code no section found, using default: /pipeline/Scripts/output_filter_taxa.py
code no section found, using default: /pipeline/Scripts/output_filter_ECs.py
code no section found, using default: /pipeline/Scripts/bwa_read_sorter.py
code no section found, using default: /pipeline/Scripts/ta_contig_name_convert.py
code no section found, using default: /pipeline/Scripts/ga_pre_scan_get_libs.py
code no section found, using default: /pipeline/Scripts/ga_pre_scan_assemble_libs.py
MetaPro operating in auto-mode
Forward Reads: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/data/fwd/1_iMUDI001_S1_L007_R1_001.fastq.gz
Reverse Reads: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/data/rev/1_iMUDI001_S1_L007_R2_001.fastq.gz
Output filepath: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output
job path: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output/quality_filter
2024-11-10 15:47:19.324491 bypassing: quality_filter
2024-11-10 15:47:19.324506 skipping job: quality_filter
quality filter: 0.0 s
quality filter cleanup: 0.0 s
2024-11-10 15:47:19.324522 continuing from: quality_filter
job path: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output/host_filter
2024-11-10 15:47:19.325503 bypassing: host_filter
2024-11-10 15:47:19.325515 skipping job: host_filter
host filter: 0.0 s
host filter cleanup: 0.0 s
2024-11-10 15:47:19.325526 continuing from: host_filter
job path: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output/vector_filter
2024-11-10 15:47:19.326474 bypassing: vector_filter
2024-11-10 15:47:19.326485 skipping job: vector_filter
vector filter: 0.0 s
vector filter cleanup: 0.0 s
2024-11-10 15:47:19.326497 continuing from: vector_filter
2024-11-10 15:47:19.326820 bypassing: rRNA_filter
rRNA filter: 0.0 s
rRNA filter cleanup: 0.0 s
2024-11-10 15:47:19.326839 continuing from: rRNA_filter
2024-11-10 15:47:19.327151 bypassing: duplicate_repopulation
repop: 0.0 s
repop cleanup: 0.0 s
2024-11-10 15:47:19.327169 continuing from: duplicate_repopulation
2024-11-10 15:47:19.327543 bypassing: assemble_contigs
2024-11-10 15:47:19.328153 MGM OK. contigs present
assemble contigs: 0.0 s
assemble contigs cleanup: 0.0 s
2024-11-10 15:47:19.328167 continuing from: assemble_contigs
2024-11-10 15:47:19.328542 bypassing: GA_pre_scan
2024-11-10 15:47:19.328553 continuing from: GA_pre_scan
2024-11-10 15:47:19.328874 running: GA_split
2024-11-10 15:47:19.328884 splitting contigs
splitting fasta for contigs
splitting fastq for singletons GA
splitting fastq for pair_1 GA
splitting fastq for pair_2 GA
2024-11-10 15:47:19.346562 closing down processes:  4
2024-11-10 15:47:19.346613 closed down: 0/4            
2024-11-10 15:47:38.511107 closed down: 1/4            
2024-11-10 15:47:38.511205 closed down: 2/4            
2024-11-10 15:47:38.511229 closed down: 3/4            
2024-11-10 15:47:40.516806 continuing from: GA_split
2024-11-10 15:47:40.516834 Running GA lib check
2024-11-10 15:47:40.516862 BWA DB check: /scratch/st-vbruce-1/irvinng/G4H-Metatranscriptomics/output/GA_pre_scan/final_results
2024-11-10 15:47:40.524891 Error: no fasta files found. BWA only accepts .fasta extensions
empty BWA database

Any guidance is appreciated. Thanks!

ParkinsonLab / MetaPro

GA_pre_scan results folder empty #24