Closed JonEilers closed 3 years ago
Hi,
TL;DR: Change reads_type
in snakemake.yml
to something other than PACBIO_SMRT
, e.g. PACBIO_SRR
. (see #1 for an explanation)
The header format of PacBio reads is >{smrt_cell}/{well}/{hq_begin}_{hq_end} RQ={qual}
where
{smrt_cell}
is an alpha-numeric ID of the SMRT cell that produced the read,{well}
is a numeric ID of the well in the SMRT cell where the read happened,{hq_begin}
is the position of the first base with "high quality",{hq_end}
is the position of the last high-quality base and{qual}
is a fraction between 0 and 1, the read quality estimate.That beeing said, your headers look like you do not have all of this info and it is not even required for DENTIST. So the easiest option is to ignore all this by changing reads_type
in snakemake.yml
(see above).
Cheers!
Thanks! I changed the read type and reran it. Got a new error: Fasta line is too long
INFO: Converting SIF file to temporary sandbox...
File SRR6282347.fasta, Line 6: Fasta line is too long (> 9998 chars)
INFO: Cleaning up image...
[Wed Sep 29 08:38:30 2021]
Error in rule reads2db:
jobid: 6
output: /home/jon/Working_Files/dentist/SRR6282347.dam, /home/jon/Working_Files/dentist/.SRR6282347.bps, /home/jon/Working_Files/dentist/.SRR6282347.hdr, /home/jon/Working_Files/dentist/.SRR6282347.idx
shell:
fasta2DAM /home/jon/Working_Files/dentist/SRR6282347.dam SRR6282347.fasta && DBsplit -x1000 -a -s200 /home/jon/Working_Files/dentist/SRR6282347.dam
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
You can fix this by running the FASTA through fold
:
mv SRR6282347.fasta SRR6282347.fasta~
fold -w1000 SRR6282347.fasta~ > SRR6282347.fasta
Thanks for reporting these issues. I will improve the workflow to take care of these things by itself.
That worked perfectly. Have one more error message for you.
Error in rule tandem_alignment_block:
jobid: 16
output: /home/jon/Working_Files/dentist/TAN.Ajap_genome.1.las
log: /home/jon/Working_Files/dentist/logs/tandem-alignment.Ajap_genome.1.log (check log file(s) for error message)
shell:
{
cd /home/jon/Working_Files/dentist
datander '-T70' -s126 -l500 -e0.7 Ajap_genome.1
} &> /home/jon/Working_Files/dentist/logs/tandem-alignment.Ajap_genome.1.log
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
log file has this: /bin/bash: line 5: 241121 Segmentation fault (core dumped) datander '-T70' -s126 -l500 -e0.7 Ajap_genome.1
It looks like you have configured max_threads: 70
in snakemake.yml
. Just reduce it a bit, say to <=32. This controls basically how many threads a single process may get. There is not much benefit in having many threads per process because the speedup does not scale linearly with the number of threads.
Instead, if you are running Snakemake on a single, big machine then you can tell it with --cores
how many threads it is allowed to utilize at any time. It will take care of launching as many jobs as possible so the cores get utilized.
I hope this answers your question. Maybe I should rename max_threads
to threads_per_process
or something. What do you think?
Hmm, sounds like a good idea. Maybe add a sentence in the readme file about using <=32 cores? Have another error :D
Error in rule mask_tandem:
jobid: 14
output: /home/jon/Working_Files/dentist/.Ajap_genome.tan.anno, /home/jon/Working_Files/dentist/.Ajap_genome.tan.data
log: /home/jon/Working_Files/dentist/logs/mask-tandem.Ajap_genome.log (check log file(s) for error message)
shell:
Catrack -v / tan &> /home/jon/Working_Files/dentist/logs/mask-tandem.Ajap_genome.log
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
log file has Catrack: Cannot open /.db for 'r'
I am have no idea what happened there. Looks a bit like a bug in Snakemake. Have you tried simply starting the workflow once more?
You called it correctly. I cleaned the directory out and restarted Snakemake and it worked. At least for awhile.
Error in rule process:
jobid: 1238
output: /home/jon/Working_Files/dentist/insertions/batch.79.db
log: /home/jon/Working_Files/dentist/logs/process.79.log (check log file(s) for error message)
shell:
dentist process --config=dentist.json --threads=4 --auxiliary-threads=6 --mask=dentist-self-H,tan-H,dentist-reads-H --batch=3950..4000 /home/jon/Working_Files/dentist/Ajap_genome.dam /home/jon/Working_Files/dentist/SRR6282347.dam /home/jon/Working_Files/dentist/pile-ups.db /home/jon/Working_Files/dentist/insertions/batch.79.db 2> /home/jon/Working_Files/dentist/logs/process.79.log
Log file contents
Error: darg.ArgParseError@darg/source/darg.d(1281): Expected a value for positional argument '<out:insertions>'
----------------
??:? pure dentist.commandline.OptionsFor!(11).OptionsFor darg.parseArgs!(dentist.commandline.OptionsFor!(11).OptionsFor).parseArgs(const(immutable(char)[][]), darg.Config) [0x55b9154c4f66]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(11).runCommand(in immutable(char)[][]) [0x55b9154ba276]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x55b915449b41]
??:? _Dmain [0x55b9152d8704]
Again I would suggest to just rerun snakemake – do not clean up the directory. Snakemake keeps track of what is left to be done.
Gotcha, I reran snakemake without cleaning up the directory and it gives the same error message and the log contains the same info, just a different jobid/batch
I am a bit puzzled because the "missing argument" is actually present as far as I can tell.
Can you tell me which version of DENTIST you are using? Please verify with one the commands below:
# if you are using pre-compiled binaries:
./bin/dentist --version
# if you are using singularity:
singularity run docker://aludi/dentist:stable dentist --version
Excepted output for Singularity:
INFO: Using cached SIF image
dentist v1.0.2-1-gd85a86f (commit d85a86fda8da241b0de3d3b8d3b02cf9e3405302)
Copyright © 2018, Arne Ludwig <arne.ludwig@posteo.de>
Subject to the terms of the MIT license, as written in the included LICENSE file
(singularity) jon@jon-PowerEdge-R910:~/Working_Files/dentist$ singularity run docker://aludi/dentist:stable dentist --version
INFO: Using cached SIF image
INFO: Converting SIF file to temporary sandbox...
dentist v1.0.2-1-gd85a86f (commit d85a86fda8da241b0de3d3b8d3b02cf9e3405302)
Copyright © 2018, Arne Ludwig <arne.ludwig@posteo.de>
Subject to the terms of the MIT license, as written in the included LICENSE file
INFO: Cleaning up image...
If it's useful to know below are the versions of singularity and snakemake that are installed in the conda environment. Here is a list of everything installed into the conda environment.
Thanks, but I still do not understand the problem. It should be working just fine. :laughing:
Could you try running the command manually with
singularity run docker://aludi/dentist:stable dentist process --config=dentist.json --threads=4 --auxiliary-threads=6 --mask=dentist-self-H,tan-H,dentist-reads-H --batch=3950..4000 /home/jon/Working_Files/dentist/Ajap_genome.dam /home/jon/Working_Files/dentist/SRR6282347.dam /home/jon/Working_Files/dentist/pile-ups.db /home/jon/Working_Files/dentist/insertions/batch.79.db 2> /home/jon/Working_Files/dentist/logs/process.79.log
same results. This may be a silly question, but I was looking at the command and noticed that the process-pileups command wants 5 positional arguments and command provided by the pipeline only has four. it's missing the "ignored"? Would that cause this error?
singularity run docker://aludi/dentist:stable \
dentist process \
--config=dentist.json \
--threads=4 \
--auxiliary-threads=6 \
--mask=dentist-self-H,tan-H,dentist-reads-H \
--batch=3950..4000 \
/home/jon/Working_Files/dentist/Ajap_genome.dam \
/home/jon/Working_Files/dentist/SRR6282347.dam \
/home/jon/Working_Files/dentist/pile-ups.db \
/home/jon/Working_Files/dentist/insertions/batch.25.db 2> /home/jon/Working_Files/dentist/logs/process.25.log
vs
INFO: Using cached SIF image
INFO: Converting SIF file to temporary sandbox...
Error: darg.ArgParseError@darg/source/darg.d(1281): Expected a value for positional argument '<out:insertions>'
----------------
??:? pure dentist.commandline.OptionsFor!(11).OptionsFor darg.parseArgs!(dentist.commandline.OptionsFor!(11).OptionsFor).parseArgs(const(immutable(char)[][]), darg.Config) [0x55a81770df66]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(11).runCommand(in immutable(char)[][]) [0x55a817703276]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x55a817692b41]
??:? _Dmain [0x55a817521704]
Usage: dentist process-pile-ups [--allow-single-reads]
[--auxiliary-threads=num-threads]
[--bad-fraction=<frac>]
[--batch=<idx-spec>[,<idx-spec>...]]
[--config=<config-json>]
[--daccord=<daccord-option>[,<daccord-option>...]]
[--daligner-consensus=<daligner-option>[,<daligner-option>...]]
[--daligner-reads-vs-reads=<daligner-option>[,<daligner-option>...]]
[--daligner-self=<daligner-option>...]
[--datander-ref=<datander-option>[,<datander-option>...]]
[--dust-reads=<dust-option>[,<dust-option>...]]
[--help] [--keep-temp]
[--mask=<name>[,<name>...]]
[--max-chain-gap=<bps>] [--max-indel=<bps>]
[--max-relative-overlap=<fraction>]
[--min-anchor-length=<uint>]
[--min-reads-per-pile-up=<ulong>]
[--min-relative-score=<fraction>]
[--min-score=<int>] [--only=<OnlyFlag>]
[--proper-alignment-allowance=num] [--quiet]
[--revert=<option>[,<option>...]]
[--threads=<uint>] [--tmpdir=<string>] [--usage]
[--verbose] <in:reference> <in:reads> <ignored>
<in:pile-ups> <out:insertions>
INFO: Cleaning up image...
Well spotted! This means that the workflow (Snakefile) and DENTIST are somewhat out of sync. I would suggest to update everything to the latest version:
Snakefile
to the latest version.max_threads
to threads_per_process
in snakemake.yml
(see 546bdbff50c0db7c78f027e64934b224baae9794).dentist_container: "docker://aludi/dentist:v2.0.0"
in snakemake.yml
.Then retry by launching snakemake once more.
I am guessing the below error has to do with limits on downloading from docker and not with dentist itself?
InputFunctionException in line 1776 of /home/jon/Working_Files/dentist/Snakefile:
Error:
Exception: failed to get alignment commands: FATAL: Unable to handle docker://aludi/dentist:v2.0.0 uri: failed to get checksum for docker://aludi/dentist:v2.0.0: Error reading manifest v2.0.0 in docker.io/aludi/dentist: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
Wildcards:
block_reads=26
Traceback:
File "/home/jon/Working_Files/dentist/Snakefile", line 243, in secondary_expand
File "/home/jon/Working_Files/dentist/Snakefile", line 1793, in <lambda>
File "/home/jon/Working_Files/dentist/Snakefile", line 421, in generate_options_for
Hi, I am getting a pacbio fasta header format error and I was wondering what format it is looking for? Here is a link to the terminal output.
The pacbio fasta headers look like this: >pacbio_SRR6282347.1.1 1 length=6524
There is a second error message I am not sure about either. The log file shows a segmentation dump