Closed jgnunes closed 3 years ago
I haven't seen this error before, but it may be happening because the pipeline now requires a full BUSCO directory setup in order to run in offline mode (previously just the lineages were needed), use something like this to set it up if you haven't already:
BUSCO=/volumes/databases/busco_2021_06
cd $BUSCO
wget -r https://busco-data.ezlab.org/v5/data
find busco-data.ezlab.org -name "*.tar.gz" | parallel "cd {//}; tar -xzf {/}"
The actual error is a little confusing as Snakemake should be catching KeyErrors in the cleanup function. I've been developing this with Snakemake v6.0.5, so if you have an older version that could also be contributing to this issue.
In fact I was using an incomplete BUSCO directory. However I downloaded the complete version and tried to restart the pipeline with the same working directory (so that I dind't need to rerun the previous steps). However I'm still having the same error. I'm using snakemake v6.4.1.
Could you post your config.yaml in case there is anything about the busco section that could be causing this? I'm still a little confused as it doesn't look like the usual errors caused by config problems and the rule isn't even starting, so that seems to rule out problems with running BUSCO. Could you try running this with Snakemake 6.0.5 to test if that makes a difference?
This is my config.yaml:
assembly:
file: /lustre/scratch116/vr/projects/vgp/user/jf18/blobtoolkit/data/limnoperna_fortunei_LF6/assembly/lf6.discovar.fasta.gz
prefix: limnoperna_fortunei_LF6
busco:
download_dir: /lustre/scratch116/vr/projects/vgp/user/jf18/blobtoolkit/databases/busco_2021_06
lineages:
- mollusca_odb10
- eukaryota_odb10
basal_lineages:
- eukaryota_odb10
reads:
paired:
- prefix: LF6-A_GTGAAA_L001
platform: ILLUMINA
file: /lustre/scratch116/vr/projects/vgp/user/jf18/blobtoolkit/data/limnoperna_fortunei_LF6/reads/LF6-A_GTGAAA_L001_R1_001.fastq.gz;/lustre/scratch116/vr/projects/vgp/user/jf18/blobtoolkit/data/limnoperna_fortunei_LF6/reads/LF6-A_GTGAAA_L001_R2_001.fastq.gz
revision: 0
settings:
blast_chunk: 100000
blast_max_chunks: 10
blast_overlap: 0
blast_min_length: 1000
taxdump: /software/grit/projects/btk/blobplot_db/taxonomy
tmp: /tmp
similarity:
defaults:
evalue: 1.0e-10
import_evalue: 1.0e-25
max_target_seqs: 10
taxrule: bestdistorder
diamond_blastx:
name: reference_proteomes
path: /software/grit/projects/btk/blobplot_db/uniprot_2019_02
diamond_blastp:
name: reference_proteomes
path: /software/grit/projects/btk/blobplot_db/uniprot_2019_02
import_max_target_seqs: 100000
blastn:
name: nt
path: /software/grit/projects/btk/blobplot_db/ncbi_2019_08
taxon:
name: Limnoperna fortunei
taxid: '356393'
version: 1
Sure, I will try to re-run the pipeline using Snakemake 6.0.5 and let you know once I do it.
I've created a new conda environment with snakemake v6.0.5 and the error is gone (now running the BUSCO step properly). However I don't think this is some incompatibility with v6.4.1 because I just realized I had already run blobtools (at my local machine) with v6.4.1 and haven't had any problems with BUSCO. So my guess is that this issue may have been caused by some problem during conda environment setting up, which has been solved with a new enviroment installation.
Anyway, thanks for the help! I'm closing this issue.
While running the pipeline (v2.6.1) on a cluster, I had the following error at the BUSCO step:
I started the pipeline with the following command:
And this is my current tree of files after the failed run:
Any idea on what may be happening here? Let me know if you need any further information.