bcgsc / ntedit_sealer_protocol

Efficient targeted error resolution and automated finishing of long-read genome assemblies
5 stars 1 forks source link

make: *** [sr_solid_k80.bf] Error 1 #7

Closed bbrunet closed 3 months ago

bbrunet commented 3 months ago

ntedit-sealer protocol was installed via conda and by cloning the Makefile pipeline from gitHub as directed. Versions of dependencies in the conda environment are as follows: ntHits v. 1.0.3, ntedit v. 2.0.2, abyss v. 2.3.7, python v. 3.12.3. When running ntedit-sealer finish using k="80 65 50" and b=30G (as determined from ntCards) and a space-delimited list of two sets of paired-end reads (in quotes) as input, the pipeline begins and successfully loads reads into a bloom filter, but then fails at ntHits with the following error:

Writing bloom filter to `-'... Unknown argument: -b36 Usage: ntHits --frequencies VAR --out-file VAR [--min-count VAR] [--max-count VAR] [--kmer-length VAR] [--seeds VAR] [-h] [--error-rate VAR] [--threads VAR] [--solid] [--long-mode] out_type files

Filters k-mers based on counts (cmin <= count <= cmax) in input files

Positional arguments: out_type Output format: Bloom filter 'bf', counting Bloom filter ('cbf'), or table ('table') [required] files Input files [nargs: 0 or more] [required]

Optional arguments: -f, --frequencies Frequency histogram file (e.g. from ntCard) [required] -o, --out-file Output file's name [required] -cmin, --min-count Minimum k-mer count (>=1), ignored if using --solid [default: 1] -cmax, --max-count Maximum k-mer count (<=254) [default: 254] -k, --kmer-length k-mer length, ignored if using spaced seeds (-s) [default: 64] -s, --seeds If specified, use spaced seeds (separate with commas, e.g. 10101,11011) -h, --num-hashes Number of hashes to generate per k-mer/spaced seed [default: 3] -p, --error-rate Target Bloom filter error rate [default: 0.0001] -t, --threads Number of parallel threads [default: 4] --solid Automatically tune 'cmin' to filter out erroneous k-mers --long-mode Optimize data reader for long sequences (>5kbp) -v Level of details printed to stdout (-v: normal, -vv detailed)

Copyright 2023 Canada's Michael Smith Genome Science Centre

make: *** [sr_solid_k80.bf] Error 1 <<<

I noticed that the protocol published in Current Protocols lists specific dependency versions (i.e "nthits>=0.0.1" "ntedit>=1.3.5" "abyss>=2.3.2", python=3.7), so I created a new environment, installed these, reran ntedit-sealer in the new environment, and now it seems to be running fine using the Bloom filter from the previous failed run.

lcoombe commented 3 months ago

Hi @bbrunet,

Thanks for the report!

Indeed, ntHits must be version 0.0.1 to be compatible with the current pipeline. We have that version listed in the "Dependencies" section of the README, but I can see that we don't specify the ntHits version in the example conda command - I will add that information to make that clearer.

Conda should take care of this constraint, but with ntHits v0.0.1, you also need ntEdit version < 2.0.0.

Just in case, I would suggest starting your pipeline from the beginning - you should make sure that you are making new Bloom filters with the correct ntHits (ie. any output files from the nthits step of your failed run shouldn't be trusted).

We are looking into possibly updating the pipeline to be compatible with newer versions of ntHits and ntEdit, but for now, pinning those lower versions should do the trick for you.

Thank you for your interest in the ntEdit+Sealer protocol! Lauren

lcoombe commented 3 months ago

Hi @bbrunet,

Just an update that I have made tweaks to the protocol pipeline so that it is now compatible with ntHits v1.0.0+. The makefile will auto-detect the ntHits version, and adjust the commands executed accordingly. The updated pipeline is released as v1.1.0.

Thanks for pointing this issue out! Lauren