AnantharamanLab / METABOLIC

A scalable high-throughput metabolic and biogeochemical functional trait profiler
172 stars 42 forks source link

metaT not producing "All_gene_collections_transcript_coverage.txt" file when using metabolic-c #68

Closed OUMason closed 2 years ago

OUMason commented 2 years ago

Describe the bug No All_gene_collections_transcript_coverage.txt file is generated using metatranscriptome reads (metaT)

To Reproduce, see error: Error: [2022-03-29 23:40:53] Drawing element cycling diagrams... readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. Illegal division by zero at /data/src/METABOLIC/METABOLIC-C.pl line 1944. readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. Illegal division by zero at /data/src/METABOLIC/METABOLIC-C.pl line 1944.

Lines in code where this error happens: }

if ($omic_reads_type eq "metaT"){
    my @Read_seq_numbers = keys %Read_seq_numbers;
    $average_read_seq_number = $average_read_seq_number / (scalar @Read_seq_numbers) ;
}

Because of the error there is no "All_gene_collections_transcript_coverage.txt" file generated

Expected behavior "All_gene_collections_transcript_coverage.txt" file generation.

Desktop (please complete the following information):

ChaoLab commented 2 years ago

It seems that the denominator "@Read_seq_numbers" is empty. Did you provide the reads in a way strictly following this instruction (https://github.com/AnantharamanLab/METABOLIC/wiki/METABOLIC-Usage#all-required-and-optional-flags; see point two at the bottom)?

OUMason commented 2 years ago

metaT run: command: perl /data/src/METABOLIC/METABOLIC-C.pl -in-gn /path/to/folder/with/genome/files/ -r /path/to/list/of/paired/reads/MT_reads_locations.txt -o /output/directory/to/be/created/ -rt metaT -t 70

-r file: the -r file (MT_reads_locations.txt) was formatted exactly as shown in point 2.

metaT run log file: [2022-03-29 21:36:41] The Prodigal annotation is running... [2022-03-29 21:39:22] The Prodigal annotation is finished [2022-03-29 21:39:35] The hmmsearch is running with 70 cpu threads... [2022-03-29 22:15:30] The hmmsearch is finished [2022-03-29 22:17:36] Generating each hmm faa collection... [2022-03-29 22:17:38] Each hmm faa collection has been made [2022-03-29 22:17:38] The KEGG module result is calculating... [2022-03-29 23:35:23] The KEGG identifier (KO id) result is calculating... [2022-03-29 23:35:28] The KEGG identifier (KO id) seaching result is finished [2022-03-29 23:35:28] Searching CAZymes by dbCAN2... [2022-03-29 23:40:37] dbCAN2 searching is done [2022-03-29 23:40:37] Searching MEROPS peptidase... ls: cannot access '/data/home/output/intermediate_files/MEROPS_Files/.MEROPSout.m8': No such file or directory ls: cannot access '/data/home/output/intermediate_files/MEROPS_Files/.MEROPSout.m8': No such file or directory [2022-03-29 23:40:49] MEROPS peptidase searching is done [2022-03-29 23:40:53] METABOLIC table has been generated [2022-03-29 23:40:53] Drawing element cycling diagrams... readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. Illegal division by zero at /data/src/METABOLIC/METABOLIC-C.pl line 1944. readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. Illegal division by zero at /data/src/METABOLIC/METABOLIC-C.pl line 1944.

metaT output: 215775099 All_gene_collections.gene 87236434 All_gene_collections.gene.scaffold.1.bt2 50857040 All_gene_collections.gene.scaffold.2.bt2 1984445 All_gene_collections.gene.scaffold.3.bt2 50857034 All_gene_collections.gene.scaffold.4.bt2 87236434 All_gene_collections.gene.scaffold.rev.1.bt2 50857040 All_gene_collections.gene.scaffold.rev.2.bt2 184320 Each_HMM_Amino_Acid_Sequence 4096 intermediate_files* 12288 KEGG_identifier_result 4096 METABOLIC_Figures 4096 METABOLIC_Figures_Input 1772 METABOLIC_log.log 4096 METABOLIC_result_each_spreadsheet 782006 METABOLIC_result.xlsx 0 tmp_calculate_depth.sh

*MEROPS_Files in intermediate_files is empty

Metabolic test run Log: [2022-03-10 14:42:08] The Prodigal annotation is running... [2022-03-10 14:43:46] The Prodigal annotation is finished [2022-03-10 14:43:46] The hmmsearch is running with 5 cpu threads... [2022-03-10 15:29:14] The hmmsearch is finished [2022-03-10 15:29:21] Generating each hmm faa collection... [2022-03-10 15:29:21] Each hmm faa collection has been made [2022-03-10 15:29:21] The KEGG module result is calculating... [2022-03-10 15:35:56] The KEGG identifier (KO id) result is calculating... [2022-03-10 15:35:56] The KEGG identifier (KO id) seaching result is finished [2022-03-10 15:35:56] Searching CAZymes by dbCAN2... [2022-03-10 15:39:44] dbCAN2 searching is done [2022-03-10 15:39:44] Searching MEROPS peptidase... ls: cannot access 'METABOLIC_out/intermediate_files/MEROPS_Files/.MEROPSout.m8': No such file or directory ls: cannot access 'METABOLIC_out/intermediate_files/MEROPS_Files/.MEROPSout.m8': No such file or directory [2022-03-10 15:39:52] MEROPS peptidase searching is done [2022-03-10 15:39:54] METABOLIC table has been generated [2022-03-10 15:39:54] Drawing element cycling diagrams... [2022-03-10 15:45:42] Drawing element cycling diagrams finished [2022-03-10 15:45:42] Drawing metabolic handoff diagrams... [2022-03-10 15:45:47] Drawing metabolic handoff diagrams finished [2022-03-10 15:45:47] Drawing energy flow chart... [2022-03-10 15:45:48] INFO: GTDB-Tk v1.6.0 [2022-03-10 15:45:48] INFO: gtdbtk classify_wf --cpus 5 -x fasta --genome_dir /data/src/METABOLIC/METABOLIC_test_files/Guaymas_Basin_genome_files --out_dir METABOLIC_out/intermediate_files/gtdbtk_Genome_files [2022-03-10 15:45:48] INFO: Using GTDB-Tk reference data version r202: /data/GTDBTK_DB/release202/ [2022-03-10 15:45:48] INFO: Identifying markers in 5 genomes with 5 threads. [2022-03-10 15:45:48] TASK: Running Prodigal V2.6.3 to identify genes. [2022-03-10 15:46:32] INFO: Completed 5 genomes in 43.29 seconds (8.66 seconds/genome). [2022-03-10 15:46:32] TASK: Identifying TIGRFAM protein families. [2022-03-10 15:46:45] INFO: Completed 5 genomes in 12.98 seconds (2.60 seconds/genome). [2022-03-10 15:46:45] TASK: Identifying Pfam protein families. [2022-03-10 15:46:46] INFO: Completed 5 genomes in 1.40 seconds (3.57 genomes/second). [2022-03-10 15:46:46] INFO: Annotations done using HMMER 3.1b2 (February 2015). [2022-03-10 15:46:46] TASK: Summarising identified marker genes. [2022-03-10 15:46:46] INFO: Completed 5 genomes in 0.21 seconds (24.18 genomes/second). [2022-03-10 15:46:46] INFO: Done. [2022-03-10 15:46:46] INFO: Aligning markers in 5 genomes with 5 CPUs. [2022-03-10 15:46:46] INFO: Processing 5 genomes identified as bacterial. [2022-03-10 15:46:54] INFO: Read concatenated alignment for 45,555 GTDB genomes. [2022-03-10 15:46:54] TASK: Generating concatenated alignment for each marker. [2022-03-10 15:46:55] INFO: Completed 5 genomes in 0.15 seconds (33.94 genomes/second). [2022-03-10 15:46:55] TASK: Aligning 120 identified markers using hmmalign 3.1b2 (February 2015). [2022-03-10 15:47:00] INFO: Completed 120 markers in 4.10 seconds (29.30 markers/second). [2022-03-10 15:47:00] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. [2022-03-10 15:49:10] INFO: Completed 45,560 sequences in 2.16 minutes (21,056.17 sequences/minute). [2022-03-10 15:49:10] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs. [2022-03-10 15:49:10] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2022-03-10 15:49:10] INFO: Creating concatenated alignment for 45,560 bacterial GTDB and user genomes. [2022-03-10 15:49:10] INFO: Creating concatenated alignment for 5 bacterial user genomes. [2022-03-10 15:49:11] INFO: Done. [2022-03-10 15:49:11] TASK: Placing 5 bacterial genomes into reference tree with pplacer using 5 CPUs (be patient). [2022-03-10 15:49:11] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2022-03-10 17:00:28] INFO: Calculating RED values based on reference tree. [2022-03-10 17:00:47] TASK: Traversing tree to determine classification method. [2022-03-10 17:00:47] INFO: Completed 5 genomes in 0.00 seconds (7,211.66 genomes/second). [2022-03-10 17:00:47] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2022-03-10 17:00:53] INFO: Completed 26 comparisons in 5.24 seconds (4.96 comparisons/second). [2022-03-10 17:00:53] INFO: 3 genome(s) have been classified using FastANI and pplacer. [2022-03-10 17:00:53] INFO: Done. [2022-03-10 17:01:05] Drawing energy flow chart finished [2022-03-10 17:01:05] Calculating MW-score ... [2022-03-10 17:01:06] Calculating MW-score is done METABOLIC-C was done, the total running time: 02:18:59 (hh:mm:ss)

Output: 1532871 All_gene_collections_mapped.depth.txt 126976 Each_HMM_Amino_Acid_Sequence 4096 intermediate_files* 4096 KEGG_identifier_result 4096 METABOLIC_Figures 4096 METABOLIC_Figures_Input 4678 METABOLIC_log.log 4096 METABOLIC_result_each_spreadsheet 196770 METABOLIC_result.xlsx 510 METABOLIC_run.log 4096 MW-score_result

*MEROPS_Files in intermediate_files is empty

ChaoLab commented 2 years ago

Hi @OUMason My first feeling is that maybe the MEROPS database is not set up well (need to check by running line 842 (the line containing "hmmscan --domtblout ....") outside of the METABOLIC). By the way, can you paste your MT_reads_locations.txt here (mask some places if you feel necessary) if you feel appropriate and convenient?

OUMason commented 2 years ago

You were right that the MEROPS database wasn't setup correctly. That is now fixed, but still getting the same, original error. Here is the log:

[2022-03-31 13:11:51] The Prodigal annotation is running... [2022-03-31 13:14:38] The Prodigal annotation is finished [2022-03-31 13:14:51] The hmmsearch is running with 70 cpu threads... [2022-03-31 14:28:43] The hmmsearch is finished [2022-03-31 14:30:42] Generating each hmm faa collection... [2022-03-31 14:30:43] Each hmm faa collection has been made [2022-03-31 14:30:43] The KEGG module result is calculating... [2022-03-31 15:47:45] The KEGG identifier (KO id) result is calculating... [2022-03-31 15:47:49] The KEGG identifier (KO id) seaching result is finished [2022-03-31 15:47:49] Searching CAZymes by dbCAN2... [2022-03-31 15:53:02] dbCAN2 searching is done [2022-03-31 15:53:02] Searching MEROPS peptidase... [2022-03-31 15:56:12] MEROPS peptidase searching is done [2022-03-31 15:56:15] METABOLIC table has been generated [2022-03-31 15:56:15] Drawing element cycling diagrams... readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. Illegal division by zero at /data/src/METABOLIC/METABOLIC-C.pl line 1944. Illegal division by zero at /data/src/METABOLIC/METABOLIC-C.pl line 1944.

Here is the MT_reads_locations.txt file: /data/home/sample1_RNA_R1.fastq,/data/home/sample1_RNA_R2.fastq /data/home/sample2_RNA_R1.fastq,/data/home/sample2_RNA_R2.fastq

ChaoLab commented 2 years ago

Hi @OUMason Your "MT_reads_locations.txt" seems to be prepared well. While the line 1904 error indicates that METABOLIC has a problem in intaking the reads. So it should be something wrong with the fastq file itself, do you have these four fastq files in the location and are they open-righted? METABOLIC cannot intake ".gz" files

OUMason commented 2 years ago

I ran METABOLIC-c with one of my MAGs and the test reads and did not get that error. Therefore, it is probably correct that the issue is related to the fastq files.

First few lines of one of my QC'd R1 fastq files:

@A00178:71:HGT77DSXX:2:1655:31521:28244 1:N:0:ATCGATCG+CGATCGAT CTGCAGATCCTCCTAGGCCTTTCCAACTTCGTCCAGCTCCGATACGTTCTGCCTTTCGGTGTCTGGTCGCGAGCCTTCTTCCCGCTTTTATTTATCGACGTATCTCTCGTCTATCTCAAGGAATCGCGCATTCCTATCGAGTCACCCACA + FFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFF,FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFF @A00178:71:HGT77DSXX:2:2275:25952:29810 1:N:0:ATCGATCG+CGATCGAT CTGCAGATCCTCCTAGGCCTTTCCAACTTCGTCCAGCTCCGATACGTTCTGCCTTTCGGTGTCTGGTCGCGAGCCTTCTTCCCGCTTTTATTTATCGACGTATCTCTCGTCTATCTCAAGGAATCGCGCATTCCTATCGAGTCACCCACA + :FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF @A00178:71:HGT77DSXX:2:1368:29279:33692 1:N:0:ATCGATCG+CGATCGAT CTGCAGATCCTCCTAGGCCTTTCCAACTTCGTCCAGCTCCGATACGTTCTGCCTTTCGGTGTCTGGTCGCGAGCCTTCTTCCCGCTTTTATTTATCGACGTATCTCTCGTCTATCTCAAGGAATCGCGCATTCCTATCGAGTCACCCACA +

First few lines of the METABOLIC R1 test read file:

@SRR3577362.106005765 D98XHXP1:198:D291FACXX:5:2114:5036:30542 length=101 CAGATGATACCGAAACGAGCAAACCCGGCGCCAAGAGCCATCATCCCCGAGCGCAGACGCCCACGTGCAACACTCAAGAGCCAGATACCGAAAGCGAGGAT + =?;:BD?B<,+)0@B)<)2@?9EBD############################################################################ @SRR3577362.57624042 D98XHXP1:198:D291FACXX:5:1303:8785:29836 length=101 TCCTGGGAAATTTTGATGATTTTTGTAATTTAATAAAGCTTGGTGAAGTGCTTAACTGTGTTCATGCTCATGCCGGGTACCCTGTGGAGCCCTGTGACCTT + @?@DADDDBFDHAHGIIGE>CHHEGFHCEGHIJIHDCA3<DGGFGABFCBBBBEHI>GGC;=BB4=CHFGHGIIFB/'7?CE3;ACC=A@BCB5:A@CC @SRR3577362.33406838 D98XHXP1:198:D291FACXX:5:1205:7860:4686 length=101 CGGTTCGAGGTGTTGTGAATGGTATTCTGGTTTTAATCTTAGGTGTACTCCTTACTGATCTTTCAGTAGTCAATCCCCTGGGGACTATTGTTTTTATTTTT + B@C?DFFDHFDFFHIFHBGHCH@HGIICAG:F@H@BDHIGGII4DGGGGIEHIIIIJHIIIIIH:FFHIHHJIIIHHHEBDFD=ACCCDCACDDDBDDEED

My files are missing length, which may be causing a problem?

ChaoLab commented 2 years ago

Your read file seems to be good (length value is not the problem as I am guessing). Is the format of the MT_reads_locations.txt that might be the problem? Is it a UNIX format? Are both read files (from both ends) of the same number of reads? Did you try a metaG run with your metagenomic reads and was the running good without any errors?

OUMason commented 2 years ago

I think I would get an error that there is not such file or directory if the MT_reads_locations.txt file wasn't properly formatted? Both read 1 and read 2 numbers are the same for each sample. I did not get the All_gene_collections_mapped.depth.txt file with metaG with my fastq files.

ChaoLab commented 2 years ago

Strange. You will probably get no info reported (all silent) if you prepared the file incorrectly. Can you show the log file (including the error reports) for your metaG run?

OUMason commented 2 years ago

[2022-03-31 18:13:12] The Prodigal annotation is running... [2022-03-31 18:15:52] The Prodigal annotation is finished [2022-03-31 18:16:05] The hmmsearch is running with 70 cpu threads... [2022-03-31 18:52:08] The hmmsearch is finished [2022-03-31 18:54:11] Generating each hmm faa collection... [2022-03-31 18:54:12] Each hmm faa collection has been made [2022-03-31 18:54:12] The KEGG module result is calculating... [2022-03-31 20:00:20] The KEGG identifier (KO id) result is calculating... [2022-03-31 20:00:25] The KEGG identifier (KO id) seaching result is finished [2022-03-31 20:00:25] Searching CAZymes by dbCAN2... [2022-03-31 20:05:29] dbCAN2 searching is done [2022-03-31 20:05:29] Searching MEROPS peptidase... [2022-03-31 20:08:36] MEROPS peptidase searching is done [2022-03-31 20:08:40] METABOLIC table has been generated [2022-03-31 20:08:40] Drawing element cycling diagrams... readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. rm: cannot remove '/data/home/.bam': No such file or directory rm: cannot remove '/data/home/.bam': No such file or directory rm: cannot remove '/data/home/.sorted.stat': No such file or directory rm: cannot remove '/data/home/.sorted.stat': No such file or directory rm: cannot remove '/data/home/.bai': No such file or directory rm: cannot remove '/data/home/.bai': No such file or directory [2022-03-31 20:24:41] Drawing element cycling diagrams finished [2022-03-31 20:24:41] Drawing metabolic handoff diagrams... [2022-03-31 20:24:48] Drawing metabolic handoff diagrams finished [2022-03-31 20:24:48] Drawing energy flow chart... [2022-03-31 20:24:48] INFO: GTDB-Tk v1.6.0 [2022-03-31 20:24:48] INFO: gtdbtk classify_wf --cpus 70 -x fasta --genome_dir /data/home/ --out_dir /data/home/intermediate_files/gtdbtk_Genome_files [2022-03-31 20:24:48] INFO: Using GTDB-Tk reference data version r202: /data/GTDBTK_DB/release202/ [2022-03-31 20:24:48] INFO: Identifying markers in 55 genomes with 70 threads. [2022-03-31 20:24:48] TASK: Running Prodigal V2.6.3 to identify genes. [2022-03-31 20:26:54] INFO: Completed 55 genomes in 2.10 minutes (26.19 genomes/minute). [2022-03-31 20:26:54] TASK: Identifying TIGRFAM protein families. [2022-03-31 20:27:16] INFO: Completed 55 genomes in 22.22 seconds (2.48 genomes/second). [2022-03-31 20:27:16] TASK: Identifying Pfam protein families. [2022-03-31 20:27:19] INFO: Completed 55 genomes in 2.63 seconds (20.92 genomes/second). [2022-03-31 20:27:19] INFO: Annotations done using HMMER 3.1b2 (February 2015). [2022-03-31 20:27:19] TASK: Summarising identified marker genes. [2022-03-31 20:27:22] INFO: Completed 55 genomes in 2.85 seconds (19.29 genomes/second). [2022-03-31 20:27:22] INFO: Done. [2022-03-31 20:27:22] INFO: Aligning markers in 55 genomes with 70 CPUs. [2022-03-31 20:27:22] INFO: Processing 55 genomes identified as bacterial. [2022-03-31 20:27:30] INFO: Read concatenated alignment for 45,555 GTDB genomes. [2022-03-31 20:27:30] TASK: Generating concatenated alignment for each marker. [2022-03-31 20:27:43] INFO: Completed 55 genomes in 0.12 seconds (470.76 genomes/second). [2022-03-31 20:27:44] TASK: Aligning 120 identified markers using hmmalign 3.1b2 (February 2015). [2022-03-31 20:28:06] INFO: Completed 120 markers in 9.45 seconds (12.69 markers/second). [2022-03-31 20:28:06] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. [2022-03-31 20:30:15] INFO: Completed 45,610 sequences in 2.14 minutes (21,311.48 sequences/minute). [2022-03-31 20:30:15] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs. [2022-03-31 20:30:15] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2022-03-31 20:30:15] INFO: Creating concatenated alignment for 45,610 bacterial GTDB and user genomes. [2022-03-31 20:30:16] INFO: Creating concatenated alignment for 55 bacterial user genomes. [2022-03-31 20:30:16] INFO: Done. [2022-03-31 20:30:17] WARNING: Setting pplacer CPUs to 64, as pplacer is known to hang if >64 are used. You can override this using: --pplacer_cpus [2022-03-31 20:30:17] TASK: Placing 55 bacterial genomes into reference tree with pplacer using 64 CPUs (be patient). [2022-03-31 20:30:17] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2022-03-31 21:56:39] INFO: Calculating RED values based on reference tree. [2022-03-31 21:56:58] TASK: Traversing tree to determine classification method. [2022-03-31 21:56:58] INFO: Completed 55 genomes in 0.01 seconds (10,506.30 genomes/second). [2022-03-31 21:56:58] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2022-03-31 21:57:02] INFO: Completed 122 comparisons in 3.61 seconds (33.81 comparisons/second). [2022-03-31 21:57:02] INFO: 0 genome(s) have been classified using FastANI and pplacer. [2022-03-31 21:57:02] INFO: Done. mv: cannot stat '/data/home/Output_energy_flow/Energy_plot/network.plot.pdf': No such file or directory mv: cannot stat '/data/home/Output_energy_flow/Energy_plot/network.plot.pdf': No such file or directory [2022-03-31 21:57:07] Drawing energy flow chart finished [2022-03-31 21:57:07] Calculating MW-score ... [2022-03-31 21:57:09] Calculating MW-score is done METABOLIC-C was done, the total running time: 03:43:57 (hh:mm:ss)

ChaoLab commented 2 years ago

It seems to be the same line 1904 error. Are both the metaG and metaT reads in the same folder? Can you send me the two txt files (MT_reads_locations.txt and MG_reads_locations.txt maybe) by attaching them but not pasting them?

OUMason commented 2 years ago

Thank you to Chao for figuring out the problem was that METABOLIC wasn't correctly intaking read files from the directory they were stored in. Moving the fastq files to a different directory and updating the read location file solved the mapping problem.

OUMason commented 1 year ago

This is the command I ran:

perl /data/src/METABOLIC/METABOLIC-C.pl -in-gn /path/to/folder/with/genome/files/ -r /path/to/list/of/paired/reads/MT_reads_locations.txt -o /output/directory/to/be/created/ -rt metaT -t 70

the -r file (MT_reads_locations.txt) was formatted exactly as shown in point 2.

I don't know if this is related, but in the log file it showed that the *.MEROPSout.m8 file wasn't created:

[2022-03-29 21:36:41] The Prodigal annotation is running... [2022-03-29 21:39:22] The Prodigal annotation is finished [2022-03-29 21:39:35] The hmmsearch is running with 70 cpu threads... [2022-03-29 22:15:30] The hmmsearch is finished [2022-03-29 22:17:36] Generating each hmm faa collection... [2022-03-29 22:17:38] Each hmm faa collection has been made [2022-03-29 22:17:38] The KEGG module result is calculating... [2022-03-29 23:35:23] The KEGG identifier (KO id) result is calculating... [2022-03-29 23:35:28] The KEGG identifier (KO id) seaching result is finished [2022-03-29 23:35:28] Searching CAZymes by dbCAN2... [2022-03-29 23:40:37] dbCAN2 searching is done [2022-03-29 23:40:37] Searching MEROPS peptidase... ls: cannot access '/data/home/output/intermediate_files/MEROPS_Files/.MEROPSout.m8': No such file or directory ls: cannot access '/data/home/output/intermediate_files/MEROPS_Files/.MEROPSout.m8': No such file or directory [2022-03-29 23:40:49] MEROPS peptidase searching is done [2022-03-29 23:40:53] METABOLIC table has been generated [2022-03-29 23:40:53] Drawing element cycling diagrams... readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. Illegal division by zero at /data/src/METABOLIC/METABOLIC-C.pl line 1944. readline() on closed filehandle IN at /data/src/METABOLIC/METABOLIC-C.pl line 1904. Illegal division by zero at /data/src/METABOLIC/METABOLIC-C.pl line 1944.

It might be helpful to see what files were created and the file size:

215775099 All_gene_collections.gene 87236434 All_gene_collections.gene.scaffold.1.bt2 50857040 All_gene_collections.gene.scaffold.2.bt2 1984445 All_gene_collections.gene.scaffold.3.bt2 50857034 All_gene_collections.gene.scaffold.4.bt2 87236434 All_gene_collections.gene.scaffold.rev.1.bt2 50857040 All_gene_collections.gene.scaffold.rev.2.bt2 184320 Each_HMM_Amino_Acid_Sequence 4096 intermediate_files 12288 KEGG_identifier_result 4096 METABOLIC_Figures 4096 METABOLIC_Figures_Input 1772 METABOLIC_log.log 4096 METABOLIC_result_each_spreadsheet 782006 METABOLIC_result.xlsx 0 tmp_calculate_depth.sh

Thank you for your help. METABOLIC is great.

Sincerely, Olivia

Olivia U. Mason Associate Professor Florida State University EOAS 1011 Academic Way Tallahassee, FL 32304 Email: @.**@.> Office Phone: 850-645-1725 Fax: 850-644-2581 Web: oumasonlab.com


From: Zhichao Zhou @.> Sent: Wednesday, March 30, 2022 3:48 PM To: AnantharamanLab/METABOLIC @.> Cc: Olivia Mason @.>; Author @.> Subject: Re: [AnantharamanLab/METABOLIC] metaT not producing "All_gene_collections_transcript_coverage.txt" file when using metabolic-c (Issue #68)

It seems that the denominator @.**_seq_numbers" is empty. Did you provide the reads in a way strictly following this instruction (https://github.com/AnantharamanLab/METABOLIC/wiki/METABOLIC-Usage#all-required-and-optional-flags<https://urldefense.com/v3/__https://github.com/AnantharamanLab/METABOLIC/wiki/METABOLIC-Usageall-required-and-optional-flags__;Iw!!PhOWcWs!iHqHtmc9QLCpRjkuO20VKpMBGjkIxtvF_C_oLoUr1jAfFbtNKFTL-K_ejkEa6w$>; see point two at the bottom)?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/AnantharamanLab/METABOLIC/issues/68*issuecomment-1083556933__;Iw!!PhOWcWs!iHqHtmc9QLCpRjkuO20VKpMBGjkIxtvF_C_oLoUr1jAfFbtNKFTL-K_QlF-oVw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABMAMXIVA5EWQXITGWPGRMTVCSVYBANCNFSM5SC63HXA__;!!PhOWcWs!iHqHtmc9QLCpRjkuO20VKpMBGjkIxtvF_C_oLoUr1jAfFbtNKFTL-K9yKZ6Pcw$. You are receiving this because you authored the thread.Message ID: @.***>

ChaoLab commented 1 year ago

Hi, It is kind of hard to tell the problems just from the log. While, can you check if 'diamond blastp' can be called within the conda env? This is because that '.MEROPSout.m8' will be generated by diamond blastp. Secondly, those line 1904 and 44 errors might be due to the lack of '.gene' files in '/path/to/folder/with/genome/files/'. Can you check why is that those 'gene' files were not generated? Normally, after 'Prodigal annotation' is finished, 'Accessory_scripts/gff2fasta_mdf.pl' will intake both 'gff' and 'fasta' files to generate 'gene' file

HAIS-st commented 1 month ago

May I ask if you can tell me specifically how to solve it? I encountered the same problem when using MetaG