ParkinsonLab / MetaPro

GNU General Public License v3.0
18 stars 3 forks source link

metapro won't recognize DB files #18

Closed hughit32 closed 10 months ago

hughit32 commented 10 months ago

The download script seems to run fine, but when I run metapro with e.g. docker run -it -v /home/mitc633/metaproStuff:/temp parkinsonlab/metapro python3 /pipeline/MetaPro.py -c /temp/Config.ini -s /temp/WGGTrinityFiltered.fasta -o /temp/testMetaproOutput

I get an error, here is the output:

METAPRO metatranscriptomic analysis pipeline

no-host: False verbose_mode: quiet CHECKING CONFIG USING CONFIG /temp/Config.ini target_rank no inner section found. using default genus AdapterRemoval_minlength found! using: 30 Show_unclassified found! using: No bypass_log_name no inner section found. using default bypass_log.txt debug_stop_flag no inner section found. using default none num_threads found! using: 22 taxa_existence_cutoff no inner section found. using default 0.1 DNA_DB_mode no inner section found. using default chocophlan RPKM_cutoff found! using: 0.01 BWA_cigar_cutoff found! using: 90 BLAT_identity_cutoff found! using: 85 BLAT_length_cutoff found! using: 0.65 BLAT_score_cutoff found! using: 60 DIAMOND_identity_cutoff found! using: 85 DIAMOND_length_cutoff found! using: 0.65 DIAMOND_score_cutoff found! using: 60 BWA_mem_footprint no inner section found. using default 5 BLAT_mem_footprint no inner section found. using default 5 DMD_mem_footprint no inner section found. using default 10 BWA_mem_threshold found! using: 75 BLAT_mem_threshold found! using: 75 DIAMOND_mem_threshold found! using: 80 DETECT_mem_threshold found! using: 80 Infernal_mem_threshold found! using: 75 Barrnap_mem_threshold found! using: 75 BWA_pp_mem_threshold found! using: 30 BLAT_pp_mem_threshold found! using: 75 DIAMOND_pp_mem_threshold found! using: 80 GA_final_merge_mem_threshold no inner section found. using default 5 TA_mem_threshold found! using: 80 repop_mem_threshold no inner section found. using default 50 EC_mem_threshold no inner section found. using default 5 BWA_job_limit found! using: 40 BLAT_job_limit found! using: 40 DIAMOND_job_limit found! using: 40 DETECT_job_limit found! using: 40 Infernal_job_limit found! using: 40 Barrnap_job_limit found! using: 40 BWA_pp_job_limit found! using: 40 BLAT_pp_job_limit found! using: 40 DIAMOND_pp_job_limit found! using: 40 GA_final_merge_job_limit no inner section found. using default 18 TA_job_limit no inner section found. using default 18 repop_job_limit no inner section found. using default 1 EC_job_limit no inner section found. using default 18 Infernal_job_delay no inner section found. using default 5 Barrnap_job_delay no inner section found. using default 5 BWA_job_delay found! using: 0.5 BLAT_job_delay found! using: 5 DIAMOND_job_delay found! using: 5 DETECT_job_delay no inner section found. using default 5 BWA_pp_job_delay found! using: 0.01 BLAT_pp_job_delay found! using: 0.05 DIAMOND_pp_job_delay found! using: 5 GA_final_merge_job_delay no inner section found. using default 5 TA_job_delay found! using: 10 repop_job_delay no inner section found. using default 10 EC_job_delay no inner section found. using default 1 keep_all found! using: yes keep_quality found! using: no keep_host found! using: no keep_vector found! using: no keep_rRNA found! using: no keep_repop found! using: no keep_assemble_contigs found! using: yes keep_GA_BWA found! using: no keep_GA_BLAT found! using: no keep_GA_DIAMOND found! using: no keep_GA_final found! using: no keep_TA found! using: no keep_EC found! using: no keep_outputs found! using: no filter_stringency found! using: high GA_chunk_size found! using: 10000 EC_chunk_size found! using: 1000 rRNA_chunk_size found! using: 50000 Labels no section found, using default: quality_filter Labels no section found, using default: host_filter Labels no section found, using default: vector_filter Labels no section found, using default: rRNA_filter Labels no section found, using default: rRNA_filter_split Labels no section found, using default: rRNA_filter_convert Labels no section found, using default: rRNA_filter_barrnap Labels no section found, using default: rRNA_filter_barrnap_merge Labels no section found, using default: rRNA_filter_barrnap_pp Labels no section found, using default: rRNA_filter_infernal Labels no section found, using default: rRNA_filter_infernal_prep Labels no section found, using default: rRNA_filter_splitter Labels no section found, using default: rRNA_filter_post Labels no section found, using default: duplicate_repopulation Labels no section found, using default: assemble_contigs Labels no section found, using default: destroy_contigs Labels no section found, using default: GA_pre_scan Labels no section found, using default: GA_split Labels no section found, using default: GA_BWA Labels no section found, using default: GA_BWA_pp Labels no section found, using default: GA_BWA_merge Labels no section found, using default: GA_BLAT Labels no section found, using default: GA_BLAT_cleanup Labels no section found, using default: GA_BLAT_cat Labels no section found, using default: GA_BLAT_pp Labels no section found, using default: GA_BLAT_merge Labels no section found, using default: GA_DMD Labels no section found, using default: GA_DMD_pp Labels no section found, using default: GA_final_merge Labels no section found, using default: taxonomic_annotation Labels no section found, using default: enzyme_annotation Labels no section found, using default: enzyme_annotation_detect Labels no section found, using default: enzyme_annotation_priam Labels no section found, using default: enzyme_annotation_priam_split Labels no section found, using default: enzyme_annotation_priam_cat Labels no section found, using default: enzyme_annotation_DMD Labels no section found, using default: enzyme_annotation_pp Labels no section found, using default: outputs Labels no section found, using default: output_copy_gene_map Labels no section found, using default: output_clean_ec Labels no section found, using default: output_copy_taxa Labels no section found, using default: output_network_generation Labels no section found, using default: output_unique_hosts_singletons Labels no section found, using default: output_unique_hosts_pair_1 Labels no section found, using default: output_unique_hosts_pair_2 Labels no section found, using default: output_unique_vectors_singletons Labels no section found, using default: output_unique_vectors_pair_1 Labels no section found, using default: output_unique_vectors_pair_2 Labels no section found, using default: output_combine_hosts Labels no section found, using default: output_per_read_scores Labels no section found, using default: output_contig_stats Labels no section found, using default: output_ec_heatmap Labels no section found, using default: output_taxa_groupby Labels no section found, using default: output_read_count UniVec_Core found! using: /home/mitc633/metaproDB/univec_core/UniVec_Core.fasta Adapter found! using: /home/mitc633/metaproDB/Trimmomatic_adapters/TruSeq3-PE-2.fa Host found! using: /home/mitc633/metaproDB/human_genome/human_genome.fasta Rfam found! using: /home/mitc633/metaproDB/Rfam/Rfam.cm DNA_DB found! using: /home/mitc633/metaproDB/choco_mpro_3/family_group source_taxa_db no inner section found. using default /project/j/jparkin/Lab_Databases/family_llbs Prot_DB found! using: /home/mitc633/metaproDB/nr/nr Prot_DB_reads found! using: /home/mitc633/metaproDB/nr/nr accession2taxid found! using: /home/mitc633/metaproDB/accession2taxid/accession2taxid nodes found! using: /home/mitc633/metaproDB/WEVOTE_db/nodes_wevote.dmp names found! using: /home/mitc633/metaproDB/WEVOTE_db/names_wevote.dmp Kaiju_db found! using: /home/mitc633/metaproDB/kaiju_db/kaiju_db_nr.fmi Centrifuge_db found! using: /home/mitc633/metaproDB/centrifuge_db/nt SWISS_PROT found! using: /home/mitc633/metaproDB/swiss_prot_db/swiss_prot_db SWISS_PROT_map found! using: /home/mitc633/metaproDB/swiss_prot_db/SwissProt_EC_Mapping.tsv PriamDB found! using: /home/mitc633/metaproDB/PRIAM_db/ DetectDB found! using: /home/mitc633/metaproDB/DETECTv2 WEVOTEDB found! using: /home/mitc633/metaproDB/WEVOTE_db/ EC_pathway found! using: /home/mitc633/metaproDB/EC_pathway/EC_pathway.txt path_to_superpath found! using: /home/mitc633/metaproDB/path_to_superpath/pathway_to_superpathway.csv MetaGeneMark_model found! using: /pipeline_tools/mgm/MetaGeneMark_v1.mod enzyme_db no inner section found. using default /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt taxid_tree found! using: /home/mitc633/metaproDB/taxid_trees/class_tree.tsv kraken2_db found! using: /home/mitc633/metaproDB/kraken2 dir name: /home/mitc633/metaproDB/nr file name: nr /home/mitc633/metaproDB/nr/nr file does not exists

Seems like it's not recognizing the nr file, but it's there. The last time I downloaded the DB, I used the script outside of docker, but I've tried it with docker as well. I used the Config.ini file that you provide, just substituting /home/mitc633/metaproDB for the database_path parameter. When I ran the downloader with docker, I used /metaproDB as the file location, and then substituted that for the database_path parameter. It seems like it must be something about the database_path that I'm using in the Config.ini file, but I can't think what I'm doing wrong. I'm not very familiar with docker, so I'm probably doing something very dumb. Thanks!

billytaj commented 10 months ago

I think your bind-mount option needs a bit of work. This is the code that's failing for you. It's just looking to see if the path to NR exists.

def check_dmd_valid(self):

if there's a .dmnd file

    #if it's sufficiently big
    dir_name = os.path.dirname(self.Prot_DB)
    basename = os.path.basename(self.Prot_DB)
    print("dir name:", dir_name)
    file_name = basename.split(".")[0]
    print("file name:", file_name)#, "extension:", extension)
    print(self.Prot_DB)
    if not(os.path.exists(self.Prot_DB)):
        sys.exit("file does not exists")
    else:
        print(dt.today(), self.Prot_DB, "exists")
    dmd_index_path = os.path.join(dir_name, file_name + ".dmnd")
    db_size = os.path.getsize(self.Prot_DB)
    if(os.path.exists(dmd_index_path)):
        if(os.path.getsize(dmd_index_path) >= db_size * 0.9):
            print(dt.today(), "DMD index is ok")
hughit32 commented 10 months ago

OK, thanks for the reply. I figured out my mistake: I was using a path related to my host machine, for my database_path variable in the Config.ini file, rather than using the target address set up with the -v parameter. So by changing database_path from /home/mitc633/metaproDB to /temp/metaproDB in Config.ini, and moving my database files accordingly, I got it to work. thank you for your help!