AnantharamanLab / METABOLIC

A scalable high-throughput metabolic and biogeochemical functional trait profiler
178 stars 44 forks source link

Error: Sequence file /path_to_folder_with_genome_files/total.faa is empty or misformatted #74

Open B-1991-ing opened 2 years ago

B-1991-ing commented 2 years ago

Dear METABOLIC support team,

I am trying to run the METABOLIC-G.pl command line. For the format of -in-gn, my genome nucleotide fasta files (my MAG dna files) ends with ".fna" and ".fa" instead of ending with ".fasta". Does it because this, I got error - "Error: Sequence file /home/projects/env_10000/people/binson/Current_pipeline/binning/bins_dereplicate/drep/dereplicated_genomes/total.faa is empty or misformatted"? Should I ignore or not?

The METABOLIC-G.pl command line perl /services/tools/metabolic/4.0/METABOLIC/METABOLIC-G.pl -in-gn ${drep_outout_dir} -o ${metabolic_g_outdir} -t 32

The METABOLIC-G.pl command log METABOLIC_run.log.txt

The submitted shell script metabolic_g.sh.txt

The submitted shell script - err metabolic1.err.txt

The submitted shell script - log metabolic1.log.txt

If it can't be ignored, could you give me some suggestions on dealing with this issue?

Best,

Bing

B-1991-ing commented 2 years ago

Dear METABOLIC support team,

Update

I didn't noticed the error message at the beginning of METABOLIC_run.log.txt. I submitted my work now. Hope it works.

The error message at the beginning of METABOLIC_run.log.txt. Loading metabolic/4.0 ERROR: metabolic/4.0 cannot be loaded due to missing prereq. HINT: the following module must be loaded first: perl

Change module loading order module load other_essential_modules module load perl/5.30.2 module load metabolic/4.0

Best,

Bing

ChaoLab commented 2 years ago

Dear METABOLIC support team,

I am trying to run the METABOLIC-G.pl command line. For the format of -in-gn, my genome nucleotide fasta files (my MAG dna files) ends with ".fna" and ".fa" instead of ending with ".fasta". Does it because this, I got error - "Error: Sequence file /home/projects/env_10000/people/binson/Current_pipeline/binning/bins_dereplicate/drep/dereplicated_genomes/total.faa is empty or misformatted"? Should I ignore or not?

The METABOLIC-G.pl command line perl /services/tools/metabolic/4.0/METABOLIC/METABOLIC-G.pl -in-gn ${drep_outout_dir} -o ${metabolic_g_outdir} -t 32

The METABOLIC-G.pl command log METABOLIC_run.log.txt

The submitted shell script metabolic_g.sh.txt

The submitted shell script - err metabolic1.err.txt

The submitted shell script - log metabolic1.log.txt

If it can't be ignored, could you give me some suggestions on dealing with this issue?

Best,

Bing

Hi Bing, you will need to rename all your input files as ending with ".fasta" is strictly required by METABOLIC

B-1991-ing commented 2 years ago

Hi Zhichao,

Thank you very much for your reply.

I renamed my ".fa" and ".fna"to ".fasta" suffix by command lines below. Hope it works this time.

Renaming command lines rename fa fasta rename fna fasta

Best,

Bing

B-1991-ing commented 2 years ago

Hi Zhichao,

I actually copied my FOLDER - "dereplicated_genomes" to a new folder "fasta_dereplicated_genomes". In case I did something wrong. Then, renamed file names in new folder.

But, still error.

The METABOLIC-G.pl command log METABOLIC_run.log.txt

The submitted shell script metabolic_g.sh.txt

The submitted shell script - err metabolic5.err.txt

For the parameter - "-in-gn", I specified the path of the FOLDER containing the ".fasta" genome files.

#-in-gn ${drep_outout_dir} drep_outout_dir=/home/projects/env_10000/people/binson/Current_pipeline/metabolic/metabolic_g

Best,

Bing

B-1991-ing commented 2 years ago

Hi Zhichao,

Update

For the parameter - "-in-gn", I specified the path of the FOLDER containing the ".fasta" genome files. So, the FOLDER NAME included in the path.

FOLDER name: "fasta_dereplicated_genomes"

The path of the FOLDER: "/home/projects/env_10000/people/binson/Current_pipeline/metabolic/metabolic_g/fasta_dereplicated_genomes"

Finally, it seems worked. BUT, still message in error file. metabolic6.err.txt

The METABOLIC-G.pl command log METABOLIC_run.log.txt

So, I need to load package "shape"? I found environment module - "shapeit4/20181214" instead of "shape" on HPC.

Best,

Bing

ChaoLab commented 2 years ago

Hi Bing, Can you paste the whole log file?

B-1991-ing commented 2 years ago

Hi Zhichao,

The METABOLIC-G.pl command line perl /services/tools/metabolic/4.0/METABOLIC/METABOLIC-G.pl -in-gn ${drep_outout_dir} -o ${metabolic_g_outdir} -t 32

The submitted shell script metabolic_g.sh.txt

The submitted shell script - error metabolic6.err.txt

The submitted shell script - log metabolic6.log.txt

METABOLIC_run.log METABOLIC_run.log.txt

"Loading required package: shape" - It's not an environment module but a package in R module. I am running on HPC, how could I load the R package - shape, as I load other modules in my shell script - metabolic_g.sh.txt? I actually already loaded "module load R/3.6.1 # (METABOLIC suggest to use R 3.x)".

Best,

Bing

ChaoLab commented 2 years ago

The error report about "shape" may just be a regular report of loading this R module instead of a real error. If the outcome of METABOLIC-G running is correct, then it is just OK, you can ignore this

B-1991-ing commented 2 years ago

Thank you very much for your confirmation, you are right. Because I checked my PDF format figures in the folder of "METABOLIC_Figures", at least all of them looks perfect. For example, pdf below.

bin.4.draw_carbon_cycle_single.pdf