functional-dark-side / agnostos-wf

44 stars 15 forks source link

mmseqs2 core dump #4

Open jmartinsjrbr opened 3 years ago

jmartinsjrbr commented 3 years ago

Hi,

We have installed Agnostos-wf and in our first attempt to analyze our metagenomic data we got the error bellow (in bold) after running 'db_creation' workflow

Used command line: _snakemake --use-conda -j 100 --cluster-config config/cluster.yaml --cluster "sbatch --export=ALL -t {cluster.time} -c {threads} --ntasks-per-node {cluster.ntasks_per_node} --nodes {cluster.nodes} --cpus-per-task {cluster.cpus_per_task} --job-name {rulename}.{jobid} --partition {cluster.partition}" -R --until workflowreport

#################################BEGIN slurm log file############################################## Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 5 Rules claiming more threads will be scaled down. Job counts: count jobs 1 mmseqs_clustering 1 Select jobs to execute...

[Fri Jun 18 15:05:26 2021] rule mmseqs_clustering: output: /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/cluDB.tsv log: logs/mmseqs_clustering_stdout.log, logs/mmseqs_clustering_stderr.err jobid: 0 benchmark: benchmarks/mmseqs_clustering/clu.tsv threads: 5

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message #################################END slurm log file##############################################

I was not able to figure out what might be happening.

Best regards, Joaquim

genomewalker commented 3 years ago

Hi Joaquim looks like is not picking up the ORF files in the folder: /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/gene_prediction Can you check that you have fastA files in there?

jmartinsjrbr commented 3 years ago

Hi genomewalker, In the path you mentioned there is only this folder: combine_samples/

I have used as input files only the metagenome contigs, like decribed on the topic: 1. DB-creation module: Start from a set of genomic/metagenomic contigs in fasta format and retrieve a database of categorised gene clusters and cluster communities.

Shall I have to include extra input files?

Hi Joaquim looks like is not picking up the ORF files in the folder: /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/gene_prediction Can you check that you have fastA files in there?

genomewalker commented 3 years ago

I had a look at the code and seems that it expect that the contig should have the following format:

{smp}_contigs.fasta

where smp would be the name of your sample or any other string. If you rename your contigs file like this it might work.

@ChiaraVanni can you generalise this so it doesn't depend on the contigs file name?

jmartinsjrbr commented 3 years ago

My input file is named: "inFile_contigs.fasta"

In addtion, find bellow my config.yaml in db_creation folder:

Maybe you can figure out some mistake I could been made.

###########################################db_creation/config/config.yaml#############################

# This file should contain everything to configure the workflow on a global scale.
# In case of sample based data, it should be complemented by a samples.tsv file that contains
# one row per sample. It can be parsed easily via pandas.
wdir: "/home/bioinf/progs/agnostos/agnostos-wf/db_creation"
rdir: "/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation"
data: "/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/L5prokka_contigs.fasta" # rename your data to match the format "{sample_name}_contigs.fasta"
# choose a name for your dataset
data_name: "L5prokka"

# If you want to classify the singleton in the four category set the following entry to "true"
singl: "true"

conda_env: "/home/bioinf/progs/agnostos/agnostos-wf/envs/workflow.yml"
# Threads configuration
threads_default: 16
threads_collect: 16
threads_cat_ref: 4
# Databases
pfam_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/Pfam-A.hmm"
pfam_clan: "/home/bioinf/progs/agnostos/agnostos-wf/databases/Pfam-A.clans.tsv.gz"
antifam_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/AntiFam.hmm"
uniref90_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniref90.db"
nr_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/nr.db"
uniclust_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniclust30_2018_08/uniclust30_2018_08"
#uniprot_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniprotKB.fasta.gz"
pfam_hh_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/pfam"
DPD: "/home/bioinf/progs/agnostos/agnostos-wf/databases/dpd_uniprot_sprot.fasta.gz"
db_dir: "/home/bioinf/progs/agnostos/agnostos-wf/databases/"
taxdb: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniprotKB"
gtdb_tax: "/home/bioinf/progs/agnostos/agnostos-wf/databases/gtdb-r89_54k/gtdb-r89_54k.fmi"
# Files retrieved from the databases
# List of shared reduced Pfam domain names (dowloadable from Figshare..)
pfam_shared_terms: "/home/bioinf/progs/agnostos/agnostos-wf/databases/Pfam-31_names_mod_01122019.tsv"
# Created using the protein accessions and the descriptions found on the fasta headers
uniref90_prot: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniref90.proteins.tsv.gz"
nr_prot: "/home/bioinf/progs/agnostos/agnostos-wf/databases/nr.proteins.tsv.gz"
# Information dowloaded from Dataset-S1 from the DPD paper:
dpd_info: "/home/bioinf/progs/agnostos/agnostos-wf/databases/dpd_ids_all_info.tsv.gz"

# Local template folder
local_tmp: "/home/bioinf/tmp"

# MPI runner (de.NBI cloud, SLURM)
mpi_runner: "srun --mpi=pmi2"

#vmtouch for the DBs
vmtouch: "vmtouch"

# Gene prediction
prodigal_mode: "meta" #"meta" for metagenomes or "normal" for genomes
prodigal_bin: "prodigal"

# Annotation
hmmer_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hmmsearch"

# Clustering config
ffindex_apply: "/home/bioinf/progs/agnostos/agnostos-wf/bin/ffindex_apply_mpi"
mmseqs_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs"
mmseqs_tmp: "/home/bioinf/progs/agnostos/agnostos-wf/tmp"
mmseqs_local_tmp: "/home/bioinf/tmp"
mmseqs_split_mem: "100G"
mmseqs_split: 10

# Clustering results config
seqtk_bin: "seqtk"

# Spurious and shadows config
hmmpress_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hmmpress"

# Compositional validation config
datamash_bin: "datamash"
famsa_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/famsa"
odseq_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/OD-seq"
leonbis_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/leon-bis.tcsh"
parasail_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/parasail_aligner"
parallel_bin: "parallel"
get_stats: "/home/bioinf/progs/agnostos/agnostos-wf/db_creation/scripts/get_stats.r"
isconn: "/home/bioinf/progs/agnostos/agnostos-wf/db_creation/scripts/is_connected"
filterg: "/home/bioinf/progs/agnostos/agnostos-wf/db_creation/scripts/filter_graph"
igraph_lib: "export LD_LIBRARY_PATH=/home/bioinf/progs/agnostos/agnostos-wf/bin/igraph/lib:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
parasail_lib: "export LD_LIBRARY_PATH=/home/bioinf/progs/agnostos/agnostos-wf/lib:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

# Cluster classification config
seqkit_bin: "seqkit"
filterbyname: "filterbyname.sh"
hhcons_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite/bin/hhconsensus"

# Cluster category refinement
hhsuite: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite"
hhblits_bin_mpi: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite/bin/hhblits_mpi"
hhmake: "/home/bioinf/progs/agnostos/agnostos-wf/binhh-suite/bin/hhmake"
hhblits_prob: 90
hypo_filt: 1.0

# Taxonomy
kaiju_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/kaiju"

# Cluster communities
hhblits_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite/bin/hhblits"
hhsearch_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite/bin/hhsearch"

########################################db_creation/config/config.yaml################################

Thanks, Joaquim

genomewalker commented 3 years ago

Hi Joaquim

here:

data: "/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/L5prokka_contigs.fasta" # rename your data to match the format "{sample_name}_contigs.fasta"

should be:

data: "/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/" # rename your data to match the format "{sample_name}_contigs.fasta"

the data folder should contain all the contig fastA files you want to process.

genomewalker commented 3 years ago

Hi @jmartinsjrbr

did this work?