Open jmartinsjrbr opened 3 years ago
Hi Joaquim
looks like is not picking up the ORF files in the folder:
/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/gene_prediction
Can you check that you have fastA files in there?
Hi genomewalker, In the path you mentioned there is only this folder: combine_samples/
I have used as input files only the metagenome contigs, like decribed on the topic: 1. DB-creation module: Start from a set of genomic/metagenomic contigs in fasta format and retrieve a database of categorised gene clusters and cluster communities.
Shall I have to include extra input files?
Hi Joaquim looks like is not picking up the ORF files in the folder:
/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/gene_prediction
Can you check that you have fastA files in there?
I had a look at the code and seems that it expect that the contig should have the following format:
{smp}_contigs.fasta
where smp
would be the name of your sample or any other string. If you rename your contigs file like this it might work.
@ChiaraVanni can you generalise this so it doesn't depend on the contigs file name?
My input file is named: "inFile_contigs.fasta"
In addtion, find bellow my config.yaml in db_creation folder:
Maybe you can figure out some mistake I could been made.
###########################################db_creation/config/config.yaml#############################
# This file should contain everything to configure the workflow on a global scale.
# In case of sample based data, it should be complemented by a samples.tsv file that contains
# one row per sample. It can be parsed easily via pandas.
wdir: "/home/bioinf/progs/agnostos/agnostos-wf/db_creation"
rdir: "/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation"
data: "/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/L5prokka_contigs.fasta" # rename your data to match the format "{sample_name}_contigs.fasta"
# choose a name for your dataset
data_name: "L5prokka"
# If you want to classify the singleton in the four category set the following entry to "true"
singl: "true"
conda_env: "/home/bioinf/progs/agnostos/agnostos-wf/envs/workflow.yml"
# Threads configuration
threads_default: 16
threads_collect: 16
threads_cat_ref: 4
# Databases
pfam_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/Pfam-A.hmm"
pfam_clan: "/home/bioinf/progs/agnostos/agnostos-wf/databases/Pfam-A.clans.tsv.gz"
antifam_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/AntiFam.hmm"
uniref90_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniref90.db"
nr_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/nr.db"
uniclust_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniclust30_2018_08/uniclust30_2018_08"
#uniprot_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniprotKB.fasta.gz"
pfam_hh_db: "/home/bioinf/progs/agnostos/agnostos-wf/databases/pfam"
DPD: "/home/bioinf/progs/agnostos/agnostos-wf/databases/dpd_uniprot_sprot.fasta.gz"
db_dir: "/home/bioinf/progs/agnostos/agnostos-wf/databases/"
taxdb: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniprotKB"
gtdb_tax: "/home/bioinf/progs/agnostos/agnostos-wf/databases/gtdb-r89_54k/gtdb-r89_54k.fmi"
# Files retrieved from the databases
# List of shared reduced Pfam domain names (dowloadable from Figshare..)
pfam_shared_terms: "/home/bioinf/progs/agnostos/agnostos-wf/databases/Pfam-31_names_mod_01122019.tsv"
# Created using the protein accessions and the descriptions found on the fasta headers
uniref90_prot: "/home/bioinf/progs/agnostos/agnostos-wf/databases/uniref90.proteins.tsv.gz"
nr_prot: "/home/bioinf/progs/agnostos/agnostos-wf/databases/nr.proteins.tsv.gz"
# Information dowloaded from Dataset-S1 from the DPD paper:
dpd_info: "/home/bioinf/progs/agnostos/agnostos-wf/databases/dpd_ids_all_info.tsv.gz"
# Local template folder
local_tmp: "/home/bioinf/tmp"
# MPI runner (de.NBI cloud, SLURM)
mpi_runner: "srun --mpi=pmi2"
#vmtouch for the DBs
vmtouch: "vmtouch"
# Gene prediction
prodigal_mode: "meta" #"meta" for metagenomes or "normal" for genomes
prodigal_bin: "prodigal"
# Annotation
hmmer_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hmmsearch"
# Clustering config
ffindex_apply: "/home/bioinf/progs/agnostos/agnostos-wf/bin/ffindex_apply_mpi"
mmseqs_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs"
mmseqs_tmp: "/home/bioinf/progs/agnostos/agnostos-wf/tmp"
mmseqs_local_tmp: "/home/bioinf/tmp"
mmseqs_split_mem: "100G"
mmseqs_split: 10
# Clustering results config
seqtk_bin: "seqtk"
# Spurious and shadows config
hmmpress_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hmmpress"
# Compositional validation config
datamash_bin: "datamash"
famsa_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/famsa"
odseq_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/OD-seq"
leonbis_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/leon-bis.tcsh"
parasail_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/parasail_aligner"
parallel_bin: "parallel"
get_stats: "/home/bioinf/progs/agnostos/agnostos-wf/db_creation/scripts/get_stats.r"
isconn: "/home/bioinf/progs/agnostos/agnostos-wf/db_creation/scripts/is_connected"
filterg: "/home/bioinf/progs/agnostos/agnostos-wf/db_creation/scripts/filter_graph"
igraph_lib: "export LD_LIBRARY_PATH=/home/bioinf/progs/agnostos/agnostos-wf/bin/igraph/lib:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
parasail_lib: "export LD_LIBRARY_PATH=/home/bioinf/progs/agnostos/agnostos-wf/lib:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
# Cluster classification config
seqkit_bin: "seqkit"
filterbyname: "filterbyname.sh"
hhcons_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite/bin/hhconsensus"
# Cluster category refinement
hhsuite: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite"
hhblits_bin_mpi: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite/bin/hhblits_mpi"
hhmake: "/home/bioinf/progs/agnostos/agnostos-wf/binhh-suite/bin/hhmake"
hhblits_prob: 90
hypo_filt: 1.0
# Taxonomy
kaiju_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/kaiju"
# Cluster communities
hhblits_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite/bin/hhblits"
hhsearch_bin: "/home/bioinf/progs/agnostos/agnostos-wf/bin/hh-suite/bin/hhsearch"
########################################db_creation/config/config.yaml################################
Thanks, Joaquim
Hi Joaquim
here:
data: "/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/L5prokka_contigs.fasta" # rename your data to match the format "{sample_name}_contigs.fasta"
should be:
data: "/home/joaquim.junior/work/projects/bagasse/analysis/agnostos/" # rename your data to match the format "{sample_name}_contigs.fasta"
the data
folder should contain all the contig fastA files you want to process.
Hi @jmartinsjrbr
did this work?
Hi,
We have installed Agnostos-wf and in our first attempt to analyze our metagenomic data we got the error bellow (in bold) after running 'db_creation' workflow
Used command line: _snakemake --use-conda -j 100 --cluster-config config/cluster.yaml --cluster "sbatch --export=ALL -t {cluster.time} -c {threads} --ntasks-per-node {cluster.ntasks_per_node} --nodes {cluster.nodes} --cpus-per-task {cluster.cpus_per_task} --job-name {rulename}.{jobid} --partition {cluster.partition}" -R --until workflowreport
#################################BEGIN slurm log file############################################## Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 5 Rules claiming more threads will be scaled down. Job counts: count jobs 1 mmseqs_clustering 1 Select jobs to execute...
[Fri Jun 18 15:05:26 2021] rule mmseqs_clustering: output: /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/cluDB.tsv log: logs/mmseqs_clustering_stdout.log, logs/mmseqs_clustering_stderr.err jobid: 0 benchmark: benchmarks/mmseqs_clustering/clu.tsv threads: 5
/home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs createdb /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/seqDB /usr/bin/bash: line 8: 1032725 Illegal instruction (core dumped) /home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs createdb /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/seqDB 2> logs/mm>[Fri Jun 18 15:05:27 2021] Error in rule mmseqs_clustering: jobid: 0 output: /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/cluDB.tsv log: logs/mmseqs_clustering_stdout.log, logs/mmseqs_clustering_stderr.err (check log file(s) for error message) shell:
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message #################################END slurm log file##############################################
I was not able to figure out what might be happening.
Best regards, Joaquim