AnantharamanLab / METABOLIC

A scalable high-throughput metabolic and biogeochemical functional trait profiler
172 stars 42 forks source link

Bowtie2 error in METABOLIC-C.pl #62

Closed mhyleung closed 2 years ago

mhyleung commented 2 years ago

Dear all

I have just recently installed METABOLIC and wanted to run the METABOLIC-C.pl test run with the recommended command

perl ./METABOLIC-C.pl -test true

Everything seemed alright until the bowtie2 step. It started with an error related to not specifying an output file, and I suppose all steps requiring bowtie2 seemed to have failed:

[2022-03-01 20:34:30] The Prodigal annotation is running...
[2022-03-01 20:35:00] The Prodigal annotation is finished
[2022-03-01 20:35:01] The hmmsearch is running with 5 cpu threads...
[2022-03-01 21:14:06] The hmmsearch is finished
[2022-03-01 21:14:09] Generating each hmm faa collection...
[2022-03-01 21:14:09] Each hmm faa collection has been made
[2022-03-01 21:14:09] The KEGG module result is calculating...
[2022-03-01 21:16:38] The KEGG identifier (KO id) result is calculating...
[2022-03-01 21:16:38] The KEGG identifier (KO id) seaching result is finished
[2022-03-01 21:16:38] Searching CAZymes by dbCAN2...
[2022-03-01 21:18:14] dbCAN2 searching is done
[2022-03-01 21:18:14] Searching MEROPS peptidase...
[2022-03-01 21:18:39] MEROPS peptidase searching is done
[2022-03-01 21:18:40] METABOLIC table has been generated
[2022-03-01 21:18:40] Drawing element cycling diagrams...
No output file specified!
Bowtie 2 version 2.4.5 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
    reference_in            comma-separated list of files with ref sequences
    bt2_index_base          write bt2 data to files with this dir/basename
*** Bowtie 2 indexes will work with Bowtie v1.2.3 and later. ***
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as
                            <reference_in>)
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    --debug                 use the debug binary; slower, assertions enabled
    --sanitized             use sanitized binary; slower, uses ASan and/or UBSan
    --verbose               log the issued command
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, less memory
    --bmax <int>            max bucket sz for blockwise suffix-array builder
    --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
    --dcv <int>             diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4 index files
    -3/--justref            just build .3/.4 index files
    -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
    -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
    --threads <int>         # of threads
    --seed <int>            seed for random number generator
    -q/--quiet              verbose output (for debugging)
    --h/--help              print this message and quit
(ERR): "METABOLIC_out/All_gene_collections.gene.scaffold" does not exist or is not a Bowtie 2 index
Exiting now ...
(ERR): "METABOLIC_out/All_gene_collections.gene.scaffold" does not exist or is not a Bowtie 2 index
Exiting now ...
rm: cannot remove ‘METABOLIC_out/All_gene_collections_mapped.1.sam’: No such file or directory
rm: cannot remove ‘METABOLIC_out/All_gene_collections_mapped.1.sam’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bt2’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bt2’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bam’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bam’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bai’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bai’: No such file or directory
[2022-03-01 21:18:43] Drawing element cycling diagrams finished
[2022-03-01 21:18:43] Drawing metabolic handoff diagrams...
mv: cannot stat ‘METABOLIC_out/newdir/Bar_plot/bar_plot_input_1.pdf’: No such file or directory
mv: cannot stat ‘METABOLIC_out/newdir/Bar_plot/bar_plot_input_1.pdf’: No such file or directory
mv: cannot stat ‘METABOLIC_out/newdir/Bar_plot/bar_plot_input_2.pdf’: No such file or directory
mv: cannot stat ‘METABOLIC_out/newdir/Bar_plot/bar_plot_input_2.pdf’: No such file or directory

I notice in the METABOLIC-C.pl script line 113 the $output should just be the working directory

`my $output = `pwd`; # The output folder`

Is there something regarding the output directory I will need to specify? Thanks

Marcus

ChaoLab commented 2 years ago

Hi Marcus, The default output folder will be "METABOLIC_out" as indicated from line 173. Does this file - "METABOLIC_out/All_gene_collections.gene" (the mapping reference for bowtie 2) - exist in your folder? I ask this due to that if it is not the issue of specifying the output address, something can be wrong with the index building step of bowtie 2.

Chao

mhyleung commented 2 years ago

Dear Chao. Thanks for the quick response.

The All_gene_collections.gene file is not present in my METABOLIC_out directory. If that is required for bowtie2-build then I suppose it will not run.

The key now is to figure out which step the All_gene_collections.gene file is generated. I cannot seem to figure out from the perl script how this file is generated. Thanks

Marcus

ChaoLab commented 2 years ago

Hi Marcus, I made a mistake here - The "All_gene_collections.gene" file will eventually be deleted. It is an intermediate file.

I just tested the command in our server which reported no errors. "All_gene_collections.gene" is the total collection of gene files (all gene files in "~/METABOLIC_test_files/Guaymas_Basin_genome_files"). Maybe you can run bowtie 2 separately to have a test if something is wrong with bowtie 2. Or you can just test your own datasets instead of sticking to using "test" (some small absolute address/sudo right/command line issues would cause problems) mode.

Chao

mhyleung commented 2 years ago

Hi Chao

So I made an attempt to run METABOLIC-C on my own Nanopore data using the following command. I am running this on an AWS instance. This time, it appears that the All_gene_collections.gene, the corresponding .bam and .sam files were created.

However, soon after I was having trouble accessing the AWS, and it seems like it has crashed my instance, and I had to drop the run. When I restarted it, I notice my log file was generating these crazy output strings when running minimap2. It has this long string of gibberish. I am using minimap2 v2.17-r941

[2022-03-02 15:51:58] Searching MEROPS peptidase...
[2022-03-02 15:52:32] MEROPS peptidase searching is done
[2022-03-02 15:52:33] METABOLIC table has been generated
[2022-03-02 15:52:33] Drawing element cycling diagrams...
sh: -c: line 0: syntax error near unexpected token `&'
sh: -c: line 0: `minimap2 -ax map-ont /data/metabolic/sample106/METABOLIC_C_out_new/All_gene_collections.gene """"""""""""""""""""""""""""""""
""""""""""""""""""#$$$$$'(%%%##$$$%%%%%(/2.-)'&&&$$%$$%%$%%%%%%%%&&%%%$$$$$%%%&%%&&''(&%$$$%%$$$$$$'$$$%%&&%%%&&&%$%$$$#$$$###$%%$$$$#$$$$$$
$$%%%%%#####$$$%&*)&%%$#$#$%%&%%$##$&%$%$$#########$%%$$$$####$%&'&&&&''&%$$$$%&&'%$$###$$$')((('''&$$$%%&''('$$$$$$$$%$%%&'((((%$$$$#######
#######""#&'''('%$####%%&'&&'%%#$$$$%%%%$%%%'))*(00/-)()*+++)&%%%%$$$%$$$$$$$#$%'*+++('%%%%%$$##$$$$#$$$$#$%%$$%$$####$##%%%%$$%&&&$##%%%&&&
%$#$$$%%%$%%%&%&%#$&'('&%$$$###&%%$$$$%%'&&&%$$%&&'&'&$$$$$%&'%%$%%%%$$$$%$$%)')'&%%%%%%$$&')+++)(((((&&''%#$%%%$$$####%&&(((('%%$$$$$%&%%&&
%$$$%&'''((((&%%$&&'')+++*)(''(''%$$$####$$$$$&%%%&%$$$%$$&''&%%%%%%%%&%####$$$$$$$#$%$&&'&'%%$####$$%$$#$$)()&&&$$$$#$$$%$$$$#$$&&&&&&&''))
&&$$$$$%$$$#####%%&%%$$$%'&&&&###$$$$$%%%&$$$####$$$$%%%%%%%%%$$$$%%$$$%%%%%'(&%&'''''('%$$$#####%$$$$$$$$$$###"#$$$%%''''((((*)(%%%%$$$#$''
%%$$$$&(('&&&&&&&&$$%%%%&$$%%%%$#$##""$$$##$$$%&&%%%&'''&%%$$$#$$%&'((''&&&%$$$#%$$$%%$$%%&&%$$%&&&&&&&&&&)))((&&%%%&'%&%%$###$%%%$$%$$$$$$$
$$$%&&&&&&%$$%%%%%&'&&&&%$$$#$$%$$##$#$$(&&&%%&&&%&'''%%&&'(''%$&'(()%%&%%&'(('%%$$$####$#%&&$$$$%%$$$%#####$%%###$$$$#$####$%%%$$$$########
###$%$())%%%&''%%%%$####%%%$%%%%$$$%&$$$$$$&%&&%$$$%%$$$#$&'''%$&''**)))()(%$%&&$##$$$%%&&&(-.'&&&&&%$$%%%&&''$%$$%%%$###$$$$$$$$######$$$$$
$$%%%$###%&%&%%$%%%&$$$$$$$$%%(&%$#$$$$$$$$########$$%%%%%%&&%&%%$$$$$#####$)**&%%%%%%%$%%&*)''''''&&%%%$$$$$##$$##$$$%%&%%%&&'(&''%%%&&$%%%
&%%%$%%$$%$$$$$%%%%$$$$$$%%%%%%$%)((((()('(&%%%%$%%$%$$%%')((()-/00/+'&%%%&'&&$$##$$%%$%%&''%%%%&))*'&&####$$$$%%%$$$$#$#$%%&)(&&$$$&&&''''&
''&&%%$$#$$%$%%%$$$$$$'('&&%%%%%$$$$$$$$$$$##$$$$%$$$$%%&%$$######$&&&%%%''&&&%%%%$%$$$%%&&%$$####$$%&%%%&&%##$$$########$#$$$%&%%%%&%$$$###
#$$%)++*%$%$$&'%%&&%$####$####$'&''%%%$#$$$$$$$$###&&&))++**'%%$%$$$$$%%$$$$$$$%&&&%$$%%%%%$#$##""######$$$$####$&%%&&'&&&&%%$%%%&**'('&&&%%
$$#$$$$&'%$######$%&%%&&%$$$$%%&%$%%%%%%')('&$#$$$%$#$$$%&%$$$$$$###$$$$%$%$$$%%%%%$$$$%&&%%$$$$##%%%%&%$$$$#$%%&%$%''%%%$$$$%&%%',-.{{{{{4)
(*{7{{/+++('''&&%%'{{{{{{6,-2/,++))('&&%$$$%$$%&%$$$&%###$$%%%%&%$$$$#$$$$$$###$$$$$$####$$$$$%%%%(())('(())+*'%%$$$%%$$$$$$$%&%))*(((((((&%
$$#$$%&''&%%)*&&%%&&&&&'('$$$$##$$%&&$$$&&&'$$%%%'''()(''%%$$$%%%%%&$$$$$$%(+*)))*)('&'&'''&&&&''')'$#########$$$$####$&%$$$%$$$$&''('&''&&$
%%%$%&(&%&&%$%%%%$$%&'((*))(('%$$$###$$$%%&'&''((+'&((&$$####%%'''%$$$$%%%%%&%$$##&&%$%%$$$$%%%#####$$$#$####&&''('&'%%%&&())+*)''%%%%%%%%%%
%$$#####%$%$$$&&''&&'%&%%$$%&%$###%&))&&&&&&&'&&&(+'%$$$###$$$$%&&%%$$$#%%%&%&&%$#$$&%%%%%%&&%$%&&'''''&&&'&&%%$$('&'&&%$$$$$$%&$$##$$$%%%%%
&&&&%%%%%%$%&&'''&%##$$$$$$###$$$$#'&%%%%&&&%%&'&&%$##%%%&&&%%%$%%$$%&&&$$$$&%%'''&%%%$&&'('%&%$$$%&$%$$$%$%$$$$$$#####%%+,(.+*+,(&$$#$$$$$$
$%%&&%%(((''%%%%&&&%&&%$&&%%$##$$%&%%%%&''&$#$$$%''(&&&&'*-+*'&&&%&&%%%$$$##########%%$$$#$$$$$$$$$%$%$$$#####$$$&&''(*,,(&&%$$$%%%%%%%%&&%$
$$##$$$%%$$$$$&%$###$%%%$$%%$%&&&%%&&%$$%%%%$$$$$%%%'(&$##$$##$$$$%%&&%%$$$$$%&$$%%&'&'''&&&%%%$%%%%%%$$$##$$$$%%%%(+*(&&&&%%%&&&&%&$$$$&%&&
&&&'''''((())((),1{{{{{-*)(&$$$%&&(%%%&&'())''$$$$$&'((++,++*(&%%%$$%%&%$$$%&&$##$$%&%%&%%%&%%%$$$######$%&&&&&()'$$$$$$$%%$#$%'***('&&&%%&%
%$$$$####$$$$$##%&$$$###$'''&'%%%%%&%%%%%&&&&)*'''&$$$$%$%$%%$$$$&&%%%%&)))'&&%%$$$##$$%&'''(''&'((()))(%%&$$%''()**)'&%&%$%&((''&%%%%%$$%$#
##$$#####$####$&')*)***+)))('&%$$$$%%&*0/----,(($$$$&&'())*4)(&%%%$$%*,++*+-,+)))&&&')2--*(%%&&&&%%%'%%%%%%&&&&&&&(*++)(((('(((''&&%%$$$%&'(
**+,-/)&%%$$%$$$$$$$$$$$$$$$%&&&&%'''()*,'%%$$%'''&&&&&%%%%%%%%''%%&&&()'(&&&%%%$###$%&&'**''&&$$########$$$####$$(**)'&&%(&%$$$$$$$%##$%%$$
$$$%$$#######$%%%%%%%&&&'&%%$$$%$$%%)*+,)(('(&&&%%&&&'''&&''&&&&*)((())''('%%%%%%%&%%''(*+)''''-.+%%%%%*+,-(''&$$$%%$$%%%&&'(%%&&%%%&&&%%%%&
%%$$$$('&&%%&&&%%%$$$###$%$$##).32--,'')))+002)'&&'('(&&%$%%%%%%&&&%%&''&&'''''')(&('&&%%%#$$$%&'(%%$$%%&%$$$$%%%%$$%&&%$$%'%&&%$$&'&%$$##$$
&%%%&$$$###$&&&&'&&%$$$$&&$$$$%'%%%%&&%$###$%$%$$%%$%%$%%%%%%&%%((('%%%%%),&$$$$$##$%%$$$%########$$$$$#$$%(''%%%%%%$#$$#$%%%&%$$$$%$$%$%$##
$%%$%&)**)'%%%%%%&&&'(&%%%$##$$%%%&&%%%$$$$%%%%$#$$$$$#$#$$$%$%%%%$%%$%%&&%%%%%%&''&&%%$$%$$$%###$%%$)(''*)(&&%$$$$%%%%''))))''&&&&('&&%%%%%
%%%$&'&%$$$$#####$###$$$%%%$####$%%&%%$$$$$$###$%$%&&&'&%$$$%%&$%&%%%$$$%'''%%%'&&&&$$$&&&''&$$$%$$$##$&%%%$$$$$%$#$$$$$$$$##$%%%%%%$#$$$&&$
$$%$%&('(%$$######$$%##########$$$$$$$$$$%%%%$#$$$%$####$%$$#$##$$%$%$$$#$$$$&&&&%$$$##$&&&&$$%$$&&%$$$$$$%%%%&(+/21-)(&$$$$$$##$&%&%$$$$#$$
#$$$$$$$%$$%&'&%&%$$%%%''%$$$$##$##$%%%%&&$%%%$%&''''%%$$%$$$%%%$$#%'&%%%&%%&%%%%%##$%%%$&%$$$#$$$&&&&&'$$$$$$%'+*()&%$%%$%%&(&%%$$$#$#$$$$$
%%$#$%%%%%$##$%$%%%%'%%$$&%$%$&'&'(((''&&)%$$%&&(('&&&'('%$$$$$$$$$$$$$%#$$$$$$%$$$$$$%&&$$$%%$$$$$%%''&('%$%$$%$$$$#$###$$$%$$$%$##$$%$##$$
%&%%$$&%%$$%%$$$$#####$$###&%$$$$$%''%$%&%$$####$$$%%&'&&%%$$$%&(**''&&&&'(('%$%%$$%'''()*+'&%$####%&%$$$$$##$$$$%%%%%%%%%%%$$$$%%$$####$#$$
##$##$$$$$'%$%$$%$$%'''%$$##$$'&&&%$#$%&&%%$#####%%$$#$#$%+&$$$$$$%%$$$&&%$$$$$$$$######$$###$$$$$$%$$%%$$$$$$$##$$&$$$$%%%%%&''%%$$$%&&&&()
*(&%%&&'''()*)('&$##%(()*(%$$$%&'&%%&&&(*+)(('$$$$$$$##$$%&&'$$$&&'%%$###$&&&'$$$%%&&%%$$$$$$##$$$$$$$$$%$###$%'&%%%%$$$%&&%&&$$%$$%%$#$#$$$
$$####$%''('%%$$$$$$#$$##$$$%$$$############$%$%&%&&''')+--,,+,--(&%$$#####$%&$#$$$$$#####$%%%%$#$$$&%%%%$%%(%%%%%$##$$#$%%%%%%%%%%$$$&'&%$$
%$##$$$%%%&'%$$%$$&*)('%%$$%%%%%%&&''&''+&%$$%$###$&'($##$&&%%%%$$$$%$$$%%%%%%%$$$##%%%&&'%$$####$%$$$##$$$#####$$%&&&''&&%&%%%%%$%%)('&$$##
#'&())*)('$$###$##$$$$%$%%&'('&$$$%''%$&%%$$######$%%%')'$$$###$%%%$%&'&%&&%$%%&'%&$%&&''%%%%%$$$##$$$$##$%%%$$$%%%&&&&%%$$$##$$$$$$$#######
#$$%%&%&&%%$$#$###$$$$$%%&&%%%%$$$#$$$%%$$%%%$%%$$%&&&&%$%%%%#%%$$$$%$$###%%%%%$########$$$&(&&&&'##$$$$$##$$%&&',('&$###$$%(%%$$$$$$##$#$%$
#####$%&%%%%$$$$##$$&('&%%###$$%&%&%$$%$%###$$$%%$$$$$$%&%%$##$%$$$$$##########$##$####%&(''%$$%########$$$%$$$$$$&'%$$$##$$##$%$&&&&%%%%(**
'&&%%$$#$$$%$%$$%%&%%$$%$%$$$$%$$##$%$##$#####%$$%$%&%$$$$$&$$$$%%%$$##$$%'$$$$%&%%&%%$$&$$$$$&%%$####$########$%%%$%%$#$$$$&%%$$%&&&%''%%%%
$%%$$$$#$$%&&&%$$#$$$%$$%%&&%%$##$$%$$%%$$$$$###$$$%%%$$$$%&&('&&$##$%)'(&%$###$%%''%%$$$$$&)+)''('''%$%%%&%$$%%&$$%%$%$$$$%$$$$##$$&%$%$$$$
%&%$%$$$$%%%%$%$%%%&((('&&$%$$##$$$$$#$$%%%$$$#$$####$%%$#$$####$%%&%$$$$&&''('%%&'&%%%&&&&%%$$$$$%%%$$$$$###%$$$%$%&&'***

I do not know what this output represents. Just wondering if anyone has encountered this before. Thanks!

Regards

Marcus

ChaoLab commented 2 years ago

Hi Marcus, How did you provide your reads to METABOLIC? is it in a txt file? (see this link: https://github.com/AnantharamanLab/METABOLIC/wiki/METABOLIC-Usage#all-required-and-optional-flags) Can just try minimap2 under the METABOLIC environment to do a trouble-shooting?

Chao

mhyleung commented 2 years ago

Oh silly me! I provided the actual .fastq file for the -r option instead of the .txt file. Sorry my bad. I shall try it again with the .txt file containing the full path to the .fastq file. Thanks

ChaoLab commented 2 years ago

That's all right! Hope it is the only problem, that will be easy!

mhyleung commented 2 years ago

So this time all but one error came up after the mapping steps:

sambamba: error while loading shared libraries: libphobos2-ldc-shared.so.81: cannot open shared object file: No such file or directory

This error also showed up when I just try to query the version of sambamba within the conda environment

sambamba --version
sambamba: error while loading shared libraries: libphobos2-ldc-shared.so.81: cannot open shared object file: No such file or directory

Seems like I am slowly edging closer to getting this fixed though....I just need to figure out how to get this libphobos2-ldc-shared.so.81 working within sambamba.

EDIT: So, it seems like the original way I installed sambamba within the conda environment of METABOLIC (based on issue #27)

conda install sambamba

Did not work for me because of the error above. Instead, I went to the most recent sambamba package to date on conda i.e. v0.8.1 at time of posting this, and when I tried to check the sambamba version, it worked without error


conda install -c bioconda sambamba

sambamba --version
sambamba 0.8.1
 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.20.0 / DMD v2.090.1 / LLVM7.0.0 / bootstrap LDC - the LLVM D compiler (0.17.6)

sambamba 0.8.1
 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.20.0 / DMD v2.090.1 / LLVM7.0.0 / bootstrap LDC - the LLVM D compiler (0.17.6)

I will try METABOLIC again now with this newer version of sambamba.

Cheers

Marcus

ChaoLab commented 2 years ago

It looks like a version problem through this link: https://github.com/bioconda/bioconda-recipes/issues/11422

mhyleung commented 2 years ago

Haha thanks Chao. I found the potential problem as you mentioned just after I posted the last message. I made an edit above hopefully it can help others in case anyone also encounter similar issues.

Will give an update on the METABOLIC run with this different sambamba version when it is done. Cheers! Marcus

mhyleung commented 2 years ago

Hi everyone, so with my minimap fixed, I now have this error:

[M::main] Real time: 3863.522 sec; CPU: 11778.136 sec; Peak RSS: 2.720 GB
[2022-03-03 10:32:43] Drawing element cycling diagrams finished
[2022-03-03 10:32:43] Drawing metabolic handoff diagrams...
mv: cannot stat ‘/data2/metabolic/sample106/METABOLIC_C_out_new/newdir/Bar_plot/bar_plot_input_1.pdf’: No such file or directory
mv: cannot stat ‘/data2/metabolic/sample106/METABOLIC_C_out_new/newdir/Bar_plot/bar_plot_input_1.pdf’: No such file or directory
mv: cannot stat ‘/data2/metabolic/sample106/METABOLIC_C_out_new/newdir/Bar_plot/bar_plot_input_2.pdf’: No such file or directory
mv: cannot stat ‘/data2/metabolic/sample106/METABOLIC_C_out_new/newdir/Bar_plot/bar_plot_input_2.pdf’: No such file or directory
[2022-03-03 10:32:44] Drawing metabolic handoff diagrams finished
[2022-03-03 10:32:44] Drawing energy flow chart...
[2022-03-03 10:32:44] INFO: GTDB-Tk v1.7.0
[2022-03-03 10:32:44] INFO: gtdbtk classify_wf --cpus 120 -x fasta --genome_dir /data2/metabolic/sample106/MAG_input --out_dir /data2/metabolic/sample106/METABOLIC_C_out_new/intermediate_files/gtdbtk_Genome_files
[2022-03-03 10:32:44] INFO: Using GTDB-Tk reference data version r202: /data2/metabolic/METABOLIC/gtdbtk_db_r202/release202
[2022-03-03 10:32:44] INFO: Identifying markers in 24 genomes with 120 threads.
[2022-03-03 10:32:44] TASK: Running Prodigal V2.6.3 to identify genes.
[2022-03-03 10:33:22] INFO: Completed 24 genomes in 38.38 seconds (1.60 seconds/genome).
[2022-03-03 10:33:22] TASK: Identifying TIGRFAM protein families.
[2022-03-03 10:33:27] INFO: Completed 24 genomes in 4.67 seconds (5.14 genomes/second).
[2022-03-03 10:33:27] TASK: Identifying Pfam protein families.
[2022-03-03 10:33:28] INFO: Completed 24 genomes in 1.04 seconds (23.18 genomes/second).
[2022-03-03 10:33:28] INFO: Annotations done using HMMER 3.1b2 (February 2015).
[2022-03-03 10:33:28] TASK: Summarising identified marker genes.
[2022-03-03 10:33:28] INFO: Completed 24 genomes in 0.52 seconds (46.41 genomes/second).
[2022-03-03 10:33:28] INFO: Done.
[2022-03-03 10:33:28] INFO: Aligning markers in 24 genomes with 120 CPUs.
[2022-03-03 10:33:29] INFO: Processing 24 genomes identified as bacterial.
[2022-03-03 10:33:31] INFO: Read concatenated alignment for 45,555 GTDB genomes.
[2022-03-03 10:33:31] TASK: Generating concatenated alignment for each marker.
[2022-03-03 10:33:39] INFO: Completed 24 genomes in 0.10 seconds (238.53 genomes/second).
[2022-03-03 10:33:40] TASK: Aligning 120 identified markers using hmmalign 3.1b2 (February 2015).
[2022-03-03 10:33:48] INFO: Completed 120 markers in 0.36 seconds (330.50 markers/second).
[2022-03-03 10:33:48] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask.
[2022-03-03 10:34:46] INFO: Completed 45,579 sequences in 58.18 seconds (783.42 sequences/second).
[2022-03-03 10:34:46] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs.
[2022-03-03 10:34:46] INFO: 6 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.
[2022-03-03 10:34:46] INFO: Creating concatenated alignment for 45,573 bacterial GTDB and user genomes.
[2022-03-03 10:34:47] INFO: Creating concatenated alignment for 18 bacterial user genomes.
[2022-03-03 10:34:47] INFO: Done.
[2022-03-03 10:34:47] WARNING: Setting pplacer CPUs to 64, as pplacer is known to hang if >64 are used. You can override this using: --pplacer_cpus
[2022-03-03 10:34:47] TASK: Placing 18 bacterial genomes into reference tree with pplacer using 64 CPUs (be patient).
[2022-03-03 10:34:47] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
[2022-03-03 11:06:01] INFO: Calculating RED values based on reference tree.
[2022-03-03 11:06:09] TASK: Traversing tree to determine classification method.
[2022-03-03 11:06:09] INFO: Completed 18 genomes in 0.01 seconds (1,248.20 genomes/second).
[2022-03-03 11:06:09] TASK: Calculating average nucleotide identity using FastANI (v1.32).
[2022-03-03 11:06:15] INFO: Completed 506 comparisons in 5.54 seconds (91.34 comparisons/second).
[2022-03-03 11:06:15] INFO: 0 genome(s) have been classified using FastANI and pplacer.
[2022-03-03 11:06:15] INFO: Done.
mv: cannot stat ‘/data2/metabolic/sample106/METABOLIC_C_out_new/Output_energy_flow/Energy_plot/network.plot.pdf’: No such file or directory
mv: cannot stat ‘/data2/metabolic/sample106/METABOLIC_C_out_new/Output_energy_flow/Energy_plot/network.plot.pdf’: No such file or directory
[2022-03-03 11:06:17] Drawing energy flow chart finished
[2022-03-03 11:06:17] Calculating MW-score ...
[2022-03-03 11:06:17] Calculating MW-score is done
METABOLIC-C was done, the total running time: 02:29:58 (hh:mm:ss)

The network plot figure directory contains no file. Is it because I do not have species-level identification based on GTDB-tk taxonomy?

Thanks

Marcus

ChaoLab commented 2 years ago
mhyleung commented 2 years ago

Hi Chao

I do have taxonomic classification for all of my MAGs according to gtdbtk.bac120.summary.tsv file. They just do not have species-level information (e.g. dBacteria;pActinobacteriota;cActinomycetia;oMycobacteriales;fPseudonocardiaceae;g;s__), and the closest reference is N/A for all of them.

As for the sequential transformation files, they seem to be good, as in input1.txt I have some 1s and mostly 0s, whereas input2.txt I have three columns with some integer and decimal values in two of them. They seem alright to me.

Cheers

Marcus

mhyleung commented 2 years ago

Sorry my bad. I went back to check the output files, and the MC score outputs seem fine. It's just what's missing now are the handoff diagrams, the network, and Sankey plots.

I went back to my METABOLIC_Figures_Input/ directory, and both the input files Functional_network_input.txt and Metabolic_Sankey_diagram_input.txt contain information that seem like the run has proceeded successfully.

Given that, in the METABOLIC-C.pl script, all the commands to generate the handoff/network/Sankey pdfs are R-based, I had the idea of going into the respective Rscripts within METABOLIC directory to try to figure out what is going on. I notice a couple of dependencies of the R library packages are missing or outdated, which might cause the R scripts to not run properly. Let me try to figure this out in the meantime.

Marcus

mhyleung commented 2 years ago

I have fixed the R dependencies, and METABOLIC-C now works well. Thank you so much for your help through this!