AnantharamanLab / METABOLIC

A scalable high-throughput metabolic and biogeochemical functional trait profiler
172 stars 42 forks source link

METABOLIC-C error with -r #49

Open PaulaCat opened 3 years ago

PaulaCat commented 3 years ago

In order to run METABOLIC-C I am using the flags -in-gn and -r. According to the instructions, I need to provide the path to the paired-end reads. I tested the command: 1) providing a path to the files, and 2) command providing a text file with the path to the files. When providing the path to the files, I tested three different options for the syntax. Here are the scripts:

1) Providing a path to the files :

bsub -n 1 -R "rusage[mem=8000]" METABOLIC-G.pl -in-gn /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files/Gamma/ -r /cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/ -o Guaymas_3

2) Providing a text file with the paths to the files (content of the text file below): bsub -n 1 -R "rusage[mem=8000]" METABOLIC-C.pl -t 1 -in-gn /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files/Gamma/ -r omic_reads_parameters.txt -o Guaymas_9 2.1) /cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/*.fastq 2.2) /cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/ 2.3)/cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/SRR3577362_sub_1.fastq /cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/SRR3577362_sub_2.fastq

When trying all of the options cited above, I get the following error: Use of uninitialized value in concatenation (.) or string at /cluster/apps/nss/metabolic/16082021/x86_64/METABOLIC-C.pl line 1788, <__IN> line 1. Use of uninitialized value in concatenation (.) or string at /cluster/apps/nss/metabolic/16082021/x86_64/METABOLIC-C.pl line 1804. stat: Bad file descriptor Warning: Could not open read file "-S" for reading; skipping... stat: Bad file descriptor Warning: Could not open read file "Guaymas_11/All_gene_collections_mapped.1.sam" for reading; skipping... Error: No input read files were valid (ERR): bowtie2-align exited with value 1 [E::hts_open_format] Failed to open file "Guaymas_11/All_gene_collections_mapped.1.sorted.bam" : No such file or directory samtools index: failed to open "Guaymas_11/All_gene_collections_mapped.1.sorted.bam": No such file or directory rm: cannot remove 'Guaymas_11/All_gene_collections_mapped.1.sam': No such file or directory rm: cannot remove 'Guaymas_11/.bam': No such file or directory rm: cannot remove 'Guaymas_11/.bai': No such file or directory

Therefore, I would like to ask you how can I fix the issue and provide the correct syntax for the command.

Thank you very much!

Paula.

ChaoLab commented 3 years ago

Hi Paula,

Many thanks for your interest in our software. In your feedback, I am afraid that you did not make a correct "omic_reads_parameters.txt" file in your 2nd test. You can have a look at this instruction: https://github.com/AnantharamanLab/METABOLIC/wiki/METABOLIC-Usage#-metabolic-usage.

The correct "omic_reads_parameters.txt" file should be:

/cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/SRR3577362_sub_1.fastq,/cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/SRR3577362_sub_2.fastq

Hope this could solve your problem. Meanwhile, we made some updates to the scripts and databases in the past week. I suggest to re-install METABOLIC if you can, now we have a conda env setting up recipe for all users so that it is no longer painful to install METABOLIC in your server.

Best! Chao

PaulaCat commented 2 years ago

Thank you for your help! I hope you are having a good day :)

I ran METABOLIC-C with the regular installation and using the Guaymas test dataset. First, I provided the paired-end reads and the path to a single MAG. Here is the command: METABOLIC-C.pl -t 1 -in-gn /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files/Gamma/ -r omic_reads_parameters.txt -o Guaymas_real

The job ran successfully, but when I checked the "Metabolic_energy_flow.pdf" and "CommunityPlot.PDF" files, these were empty. I figured out the reason was the fact that I only provided one MAG.

I then tried providing all the path to the folder with ALL the MAGs from the Guaymas test dataset. Here is the command I ran: METABOLIC-C.pl -t 4 -in-gn /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files -r omic_reads_parameters.txt -o Guaymas_real_real

As a result, I obtained the following errors: Use of uninitialized value $cat in concatenation (.) or string at /cluster/apps/nss/metabolic/16082021/x86_64/METABOLIC-C.pl line 1513. Use of uninitialized value within %Bin2Cat in concatenation (.) or string at /cluster/apps/nss/metabolic/16082021/x86_64/METABOLIC-C.pl line 1537.

Do you know a strategy to solve the error?

Thanks again!

Paula.

ChaoLab commented 2 years ago

Hi Paula, Can you paste your "omic_reads_parameters.txt" here in GitHub? Did you install METABOLIC by conda or by the regular method? Is the GTDB-Tk working well?

PaulaCat commented 2 years ago

Hi Chao!

Here is the "omics_reads_parameters.txt" file: /cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/SRR3577362_sub_1.fastq,/cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/SRR3577362_sub_2.fastq

I am trying the regular installation (I am also trying the conda installation in parallel, but I am still working on that).

Thanks again!

Paula.

ChaoLab commented 2 years ago

Hi Paula, Your "omics_reads_parameters.txt" seems to be good. I realized that you might use an old version of METABOLIC-C.pl before I made several changes to the GitHub repository. I suggest that you can follow the instruction of conda installation to re-install METABOLIC and try again.

PaulaCat commented 2 years ago

Okay, thanks, Chao! I will try the conda installation once I have it working and let you know.

PaulaCat commented 2 years ago

Hi again, Chao! I tried running METABOLIC-C with the conda installation using the following command (the "omics_reads_parameters.txt" file is the same one):

/cluster/project/magna/software/METABOLIC/METABOLIC/METABOLIC-C.pl -t 8 -in-gn /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files -r omic_reads_parameters.txt -o Guaymas_c

And I obtained the following error:

Error: Failed to open sequence file /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files/total.faa for reading

Thanks for your help!

Paula.

ChaoLab commented 2 years ago

Can you paste all the pop-ups? Did Prodigal run well?

PaulaCat commented 2 years ago

Sure, here it is: The output (if any) follows:

[2021-09-14 14:36:19] The Prodigal annotation is running... [2021-09-14 14:37:26] The Prodigal annotation is finished [2021-09-14 14:37:27] The hmmsearch is running with 8 cpu threads...

Error: Failed to open sequence file /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files/total.faa for reading

ChaoLab commented 2 years ago

I also have this experience before. My case is that I did not completely kill all the previous METABOLIC-C runs. I suggest to fully kill/terminate all the METABOLIC runs or related software runs, and make a new run afterward.

PaulaCat commented 2 years ago

Hi Chao, Thanks! I tested your suggestion and here is the outcome:

[2021-09-14 19:21:45] The Prodigal annotation is running... [2021-09-14 19:22:27] The Prodigal annotation is finished [2021-09-14 19:22:28] The hmmsearch is running with 8 cpu threads... [2021-09-14 19:48:02] The hmmsearch is finished [2021-09-14 19:49:39] Generating each hmm faa collection... [2021-09-14 19:49:55] Each hmm faa collection has been made [2021-09-14 19:49:55] The KEGG module result is calculating... [2021-09-14 19:53:27] The KEGG identifier (KO id) result is calculating... [2021-09-14 19:53:28] The KEGG identifier (KO id) seaching result is finished [2021-09-14 19:53:28] Searching CAZymes by dbCAN2... [2021-09-14 19:56:13] dbCAN2 searching is done [2021-09-14 19:56:13] Searching MEROPS peptidase... [2021-09-14 19:57:34] MEROPS peptidase searching is done [2021-09-14 19:57:35] METABOLIC table has been generated [2021-09-14 19:57:35] Drawing element cycling diagrams... Loading required package: shape [2021-09-14 20:01:04] Drawing element cycling diagrams finished [2021-09-14 20:01:04] Drawing metabolic handoff diagrams... [2021-09-14 20:01:09] Drawing metabolic handoff diagrams finished [2021-09-14 20:01:09] Drawing energy flow chart... Use of uninitialized value $cat in concatenation (.) or string at /cluster/project/magna/software/METABOLIC/METABOLIC/METABOLIC-C.pl line 1464 Use of uninitialized value in concatenation (.) or string at /cluster/project/magna/software/METABOLIC/METABOLIC/METABOLIC-C.pl line 1487.

ChaoLab commented 2 years ago

Did you change the shebang line of METABOLIC-C.pl? did you run METABOLIC under the conda environment?

PaulaCat commented 2 years ago

Hi! Yes, here is the shebang line:

!/cluster/project/magna/miniconda/envs/METABOLIC_v4.0/bin/perl

###########################

METABOLIC-C.pl

ChaoLab commented 2 years ago

My another guess is that GTDB-Tk has some problems. Did you check whether you can properly call GTDB-Tk software and the GTDB-Tk result is good (located in the "intermediate_results" folder within the output directory)

patriciatran commented 2 years ago

Please see this: https://github.com/AnantharamanLab/METABOLIC/issues/41 , if some MAGS don't have a GTDB classification it might result in this error you are seeing with $cat

PaulaCat commented 2 years ago

Hi Chao!

I finally fixed the error I reported but now I am getting different errors.

Here is the first one:

[2021-10-22 18:02:11] Drawing energy flow chart... ==> Processed 37/40 genomes (92%) |█████████████▉ | [ 3.59genome/s, ETA 00:00]FATAL: Sequence identifiers must be unique. Your fasta file contains two sequences with the same id (NODE_550_length_25751_cov_6.775685_1) Use of uninitialized value $cat in concatenation (.) or string at /cluster/project/magna/software/METABOLIC/METABOLIC/METABOLIC-C.pl line 1463.

Here is the second one :

Use of uninitialized value in concatenation (.) or string at /cluster/project/magna/software/METABOLIC/METABOLIC/METABOLIC-C.pl line 1486. Loading required package: ggplot2 Error: Must request at least one colour from a hue palette. In addition: Warning message: The parameter infer.label is deprecated. Use aes(label = after_stat(stratum)). Execution halted Loading required package: ggplot2

Do you know how can I fix them?

Let me know if you need me to provide you with the submitted scripts again!

Paula.

ChaoLab commented 2 years ago

Hi Chao!

I finally fixed the error I reported but now I am getting different errors.

Here is the first one:

[2021-10-22 18:02:11] Drawing energy flow chart... ==> Processed 37/40 genomes (92%) |█████████████▉ | [ 3.59genome/s, ETA 00:00]FATAL: Sequence identifiers must be unique. Your fasta file contains two sequences with the same id (NODE_550_length_25751_cov_6.775685_1) Use of uninitialized value $cat in concatenation (.) or string at /cluster/project/magna/software/METABOLIC/METABOLIC/METABOLIC-C.pl line 1463.

Here is the second one :

Use of uninitialized value in concatenation (.) or string at /cluster/project/magna/software/METABOLIC/METABOLIC/METABOLIC-C.pl line 1486. Loading required package: ggplot2 Error: Must request at least one colour from a hue palette. In addition: Warning message: The parameter infer.label is deprecated. Use aes(label = after_stat(stratum)). Execution halted Loading required package: ggplot2

Do you know how can I fix them?

Let me know if you need me to provide you with the submitted scripts again!

Paula.

  1. For the first problem, it might just literally point to that your fasta file has duplicated sequences (or at least sequences with duplicated names).
  2. For the second problem, I guess this is a downstream error report due to the "$cat" issue in the first problem (I guess "$Bin2Cat{$gn}[0]" is uninitiated due to $cat is uninitiated in the first problem).

So maybe first you need to solve the input fasta sequence id duplication issue.