AnantharamanLab / METABOLIC

A scalable high-throughput metabolic and biogeochemical functional trait profiler
175 stars 44 forks source link

Error(s) with test run #41

Open Ahmed-Shibl opened 3 years ago

Ahmed-Shibl commented 3 years ago

Hi @patriciatran and @ChaoLab,

I've recently re-installed METABOLIC using conda [https://github.com/AnantharamanLab/METABOLIC/issues/27] and git clone https://github.com/AnantharamanLab/METABOLIC.git in a new environment. When I tried running the command with the test dataset, I got some errors that I assume are perl-related.

This is the command I used: perl METABOLIC-G.pl -test true

And this was the output + errors/warnings:

[2021-03-22 20:46:38] The Prodigal annotation is running...
[2021-03-22 20:47:23] The Prodigal annotation is finished
[2021-03-22 20:47:23] The hmmsearch is running with 5 cpu threads...
[2021-03-22 21:34:30] The hmmsearch is finished
readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoA.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoA.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoA.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoA.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoB.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoB.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoB.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoB.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoC.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoC.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoC.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoC.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoA.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoA.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoA.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoA.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoC.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoC.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoC.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoC.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoB.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoB.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoB.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoB.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
[2021-03-22 21:34:35] The hmm hit result is calculating...
[2021-03-22 21:34:35] Generating each hmm faa collection...
[2021-03-22 21:34:35] Each hmm faa collection has been made
[2021-03-22 21:34:35] The KEGG module result is calculating...
[2021-03-22 21:38:26] The KEGG identifier (KO id) result is calculating...
[2021-03-22 21:38:26] The KEGG identifier (KO id) seaching result is finished
[2021-03-22 21:38:26] Searching CAZymes by dbCAN2...
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
[2021-03-22 21:41:10] dbCAN2 searching is done
[2021-03-22 21:41:10] Searching MEROPS peptidase...
[2021-03-22 21:42:37] MEROPS peptidase searching is done
Warning message:
package ‘openxlsx’ was built under R version 4.0.3 
[2021-03-22 21:42:39] METABOLIC table has been generated
[2021-03-22 21:42:39] Drawing element cycling diagrams...
Loading required package: shape
[2021-03-22 21:42:41] Drawing element cycling diagrams finished

Please let me know if you need any additional information - thanks in advance! Looking forward to re-running this smoothly and applying it to my datasets.

Ahmed-Shibl commented 3 years ago

Quick update:

I updated R with conda install -c conda-forge R=4.0 and ran unset GREP_OPTIONS in the environment. Then I ran perl METABOLIC-G.pl -test true again outside of tmux and it was perfectly fine. No warnings or errors.

However, running perl METABOLIC-C.pl -test true returned the following:

[2021-03-23 10:04:25] The Prodigal annotation is running...
[2021-03-23 10:05:11] The Prodigal annotation is finished
[2021-03-23 10:05:11] The hmmsearch is running with 5 cpu threads...
[2021-03-23 10:45:39] The hmmsearch is finished
[2021-03-23 10:45:42] Generating each hmm faa collection...
[2021-03-23 10:45:43] Each hmm faa collection has been made
[2021-03-23 10:45:43] The KEGG module result is calculating...
[2021-03-23 10:49:18] The KEGG identifier (KO id) result is calculating...
[2021-03-23 10:49:18] The KEGG identifier (KO id) seaching result is finished
[2021-03-23 10:49:18] Searching CAZymes by dbCAN2...
[2021-03-23 10:52:02] dbCAN2 searching is done
[2021-03-23 10:52:02] Searching MEROPS peptidase...
[2021-03-23 10:53:26] MEROPS peptidase searching is done
[2021-03-23 10:53:27] METABOLIC table has been generated
[2021-03-23 10:53:27] Drawing element cycling diagrams...
Loading required package: shape
[2021-03-23 10:56:49] Drawing element cycling diagrams finished
[2021-03-23 10:56:49] Drawing metabolic handoff diagrams...
[2021-03-23 10:56:53] Drawing metabolic handoff diagrams finished
[2021-03-23 10:56:53] Drawing energy flow chart...
Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1369.
Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1369.
Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1369.
Use of uninitialized value within %Bin2Cat in concatenation (.) or string at METABOLIC-C.pl line 1392.
Use of uninitialized value within %Bin2Cat in concatenation (.) or string at METABOLIC-C.pl line 1392.
Use of uninitialized value within %Bin2Cat in concatenation (.) or string at METABOLIC-C.pl line 1392.
Loading required package: ggplot2
Error: Must request at least one colour from a hue palette.
In addition: Warning message:
The parameter `infer.label` is deprecated.
Use `aes(label = after_stat(stratum))`. 
Execution halted
Loading required package: ggplot2

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✔ tibble  3.1.0     ✔ dplyr   1.0.5
✔ tidyr   1.1.3     ✔ stringr 1.4.0
✔ readr   1.4.0     ✔ forcats 0.5.1
✔ purrr   0.3.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
✖ purrr::compose()       masks igraph::compose()
✖ tidyr::crossing()      masks igraph::crossing()
✖ dplyr::filter()        masks stats::filter()
✖ dplyr::groups()        masks igraph::groups()
✖ dplyr::lag()           masks stats::lag()
✖ purrr::simplify()      masks igraph::simplify()

Attaching package: ‘tidygraph’

The following object is masked from ‘package:igraph’:

    groups

The following object is masked from ‘package:stats’:

    filter

Error: Must request at least one colour from a hue palette.
Execution halted
[2021-03-23 10:56:56] Drawing energy flow chart finished
[2021-03-23 10:56:56] Calculating MN-score ...
Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1508.
Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1508.
Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1508.
Use of uninitialized value within %Bin2Cat in concatenation (.) or string at METABOLIC-C.pl line 1532.
Use of uninitialized value within %Bin2Cat in concatenation (.) or string at METABOLIC-C.pl line 1532.
Use of uninitialized value within %Bin2Cat in concatenation (.) or string at METABOLIC-C.pl line 1532.
[2021-03-23 10:56:56] Calculating MN-score is done
strejcem commented 3 years ago

Hi Ahmed, I had the same issues. The first one: Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373. is a perl issue. However, following #27 all issues disappeared for me. Check carefully that all required packages were installed correctly. Unfortunately, conda has issues with Perl. For example, installation of array::split previously exited with some compilation error. IIRC, the second one: Error: Must request at least one colour from a hue palette. is an R problem. METABOLIC uses R version 3.x while you have 4.x, in 4.x they changed default behavior for function read.table(), specifically, the function loads strings as factors in 3.x and as characters in 4.x. ggalluvial in METABOLIC script expects factors. Run the older version of R or you can change the R script with read.table(stringsAsFactors=TRUE).

Hope it helps. Michal

Ahmed-Shibl commented 3 years ago

Hi @strejcem Thanks for your input! I've followed the conda installation word for word and I'm still getting the same perl-related error:

[2021-04-26 14:06:01] Drawing metabolic handoff diagrams finished
[2021-04-26 14:06:01] Drawing energy flow chart...
Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1374.
Use of uninitialized value within %Bin2Cat in concatenation (.) or string at METABOLIC-C.pl line 1397.
Use of uninitialized value within %Bin2Cat in concatenation (.) or string at METABOLIC-C.pl line 1397.

These are the line numbers 1362-1400 in METABOLIC-C.pl :

my %Hash_gn_n_pth = (); 
my %Total_R_community_coverage = (); # genome\tpathway => category \t pathway \t genome coverage percentage
if ($omic_reads_parameters){
    my %Genome_cov = %Genome_cov_constant;
    #%Total_R_input pathway => gn => 1 or 0
    foreach my $pth (sort keys %Total_R_input){
        my $gn_cov_percentage = 0;
        foreach my $gn (sort keys %Hmmscan_result){
            if ($Genome_cov{$gn} and $Total_R_input{$pth}{$gn}){
                $gn_cov_percentage = $Genome_cov{$gn};
                my $cat = $Bin2Cat{$gn};
                my $gn_n_pth = "$gn\t$pth"; $Hash_gn_n_pth{$gn_n_pth} = 1;
                $Total_R_community_coverage{$gn_n_pth} = "$cat\t$pth\t$gn_cov_percentage";
            }
        }
    }       
}

my %Total_R_community_coverage2 = (); #$genome\tpath pair => cat \t  coverage percentage average
foreach my $gn (sort keys %Hmmscan_result){
    my %Path = (); # path => 1
    foreach my $gn_n_pth (sort keys %Total_R_community_coverage){
        if ($gn_n_pth =~ /$gn\t/){
            my @tmp = split (/\t/,$gn_n_pth);
            $Path{$tmp[1]} = 1;
        }
    }
    my @Path_keys = sort keys %Path;
    for(my $i=0; $i<=$#Path_keys; $i++){
        for(my $j = $i+1; $j<=$#Path_keys; $j++){
            my $pair = "$Path_keys[$i]\t$Path_keys[$j]";
            my $coverage = 0;
            my @tmp1 = split (/\t/, $Total_R_community_coverage{"$gn\t$Path_keys[$i]"});
            my @tmp2 = split (/\t/, $Total_R_community_coverage{"$gn\t$Path_keys[$j]"});
            $coverage = ($tmp1[2] + $tmp2[2]) / 2;
            $Total_R_community_coverage2{"$gn\t$pair"} = $Bin2Cat{$gn}."\t".$coverage;
        }
    }
}

I wonder if it's a bug in the script. I'm going to try and tackle the R issue next.

ChaoLab commented 3 years ago

Hi @Ahmed-Shibl, it seems that something is wrong with "$cat" (means category). I am wondering whether the GTDB-tk worked properly

Ahmed-Shibl commented 3 years ago

Hi @Ahmed-Shibl, it seems that something is wrong with "$cat" (means category). I am wondering whether the GTDB-tk worked properly

Hi @ChaoLab, the output of gtdbtk check_install is:

[2021-04-27 21:57:01] INFO: GTDB-Tk v1.4.1
[2021-04-27 21:57:01] INFO: gtdbtk check_install
[2021-04-27 21:57:01] INFO: Using GTDB-Tk reference data version r95: ~/miniconda3/envs/2metabolic/release95/
[2021-04-27 21:57:01] INFO: Running install verification
[2021-04-27 21:57:01] INFO: Checking that all third-party software are on the system path:
[2021-04-27 21:57:01] INFO:          |-- FastTree         OK
[2021-04-27 21:57:01] INFO:          |-- FastTreeMP       OK
[2021-04-27 21:57:01] INFO:          |-- fastANI          OK
[2021-04-27 21:57:01] INFO:          |-- guppy            OK
[2021-04-27 21:57:01] INFO:          |-- hmmalign         OK
[2021-04-27 21:57:01] INFO:          |-- hmmsearch        OK
[2021-04-27 21:57:01] INFO:          |-- mash             OK
[2021-04-27 21:57:01] INFO:          |-- pplacer          OK
[2021-04-27 21:57:01] INFO:          |-- prodigal         OK
[2021-04-27 21:57:01] INFO: Checking integrity of reference package: ~/miniconda3/envs/2metabolic/release95/
[2021-04-27 21:57:03] INFO:          |-- pplacer          OK                                        
[2021-04-27 21:57:03] INFO:          |-- masks            OK                                        
[2021-04-27 21:57:04] INFO:          |-- markers          OK                                        
[2021-04-27 21:57:04] INFO:          |-- radii            OK                                        
[2021-04-27 21:57:11] INFO:          |-- msa              OK                                        
[2021-04-27 21:57:11] INFO:          |-- metadata         OK                                        
[2021-04-27 21:57:11] INFO:          |-- taxonomy         OK   

and gtdbtk test also runs fine..

strejcem commented 3 years ago

@Ahmed-Shibl

You should also check if all the MAG were classified by GTDBtk. I think METABOLICC looks for classification at Phylum level, if there is none for a MAG(s) the $cat variable might end up empty and throw errors. Just a thought.