a-ludi / dentist

Close assembly gaps using long-reads at high accuracy.
https://a-ludi.github.io/dentist/
MIT License
47 stars 6 forks source link

Pacbio header line format error #27

Closed JonEilers closed 3 years ago

JonEilers commented 3 years ago

Hi, I am getting a pacbio fasta header format error and I was wondering what format it is looking for? Here is a link to the terminal output.

The pacbio fasta headers look like this: >pacbio_SRR6282347.1.1 1 length=6524

There is a second error message I am not sure about either. The log file shows a segmentation dump

/bin/bash: line 5: 208846 Segmentation fault      (core dumped) datander '-T70' -s126 -l500 -e0.7 Ajap_genome.2
a-ludi commented 3 years ago

Hi,

TL;DR: Change reads_type in snakemake.yml to something other than PACBIO_SMRT, e.g. PACBIO_SRR. (see #1 for an explanation)

The header format of PacBio reads is >{smrt_cell}/{well}/{hq_begin}_{hq_end} RQ={qual} where

That beeing said, your headers look like you do not have all of this info and it is not even required for DENTIST. So the easiest option is to ignore all this by changing reads_type in snakemake.yml (see above).

Cheers!

JonEilers commented 3 years ago

Thanks! I changed the read type and reran it. Got a new error: Fasta line is too long

INFO:    Converting SIF file to temporary sandbox...
File SRR6282347.fasta, Line 6: Fasta line is too long (> 9998 chars)
INFO:    Cleaning up image...
[Wed Sep 29 08:38:30 2021]
Error in rule reads2db:
    jobid: 6
    output: /home/jon/Working_Files/dentist/SRR6282347.dam, /home/jon/Working_Files/dentist/.SRR6282347.bps, /home/jon/Working_Files/dentist/.SRR6282347.hdr, /home/jon/Working_Files/dentist/.SRR6282347.idx
    shell:
        fasta2DAM /home/jon/Working_Files/dentist/SRR6282347.dam SRR6282347.fasta && DBsplit -x1000 -a -s200 /home/jon/Working_Files/dentist/SRR6282347.dam
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
a-ludi commented 3 years ago

You can fix this by running the FASTA through fold:

mv SRR6282347.fasta SRR6282347.fasta~
fold -w1000 SRR6282347.fasta~ > SRR6282347.fasta

Thanks for reporting these issues. I will improve the workflow to take care of these things by itself.

JonEilers commented 3 years ago

That worked perfectly. Have one more error message for you.

Error in rule tandem_alignment_block:
    jobid: 16
    output: /home/jon/Working_Files/dentist/TAN.Ajap_genome.1.las
    log: /home/jon/Working_Files/dentist/logs/tandem-alignment.Ajap_genome.1.log (check log file(s) for error message)
    shell:

            {
                cd /home/jon/Working_Files/dentist
                datander '-T70' -s126 -l500 -e0.7 Ajap_genome.1
            } &> /home/jon/Working_Files/dentist/logs/tandem-alignment.Ajap_genome.1.log

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

log file has this: /bin/bash: line 5: 241121 Segmentation fault (core dumped) datander '-T70' -s126 -l500 -e0.7 Ajap_genome.1

a-ludi commented 3 years ago

It looks like you have configured max_threads: 70 in snakemake.yml. Just reduce it a bit, say to <=32. This controls basically how many threads a single process may get. There is not much benefit in having many threads per process because the speedup does not scale linearly with the number of threads.

Instead, if you are running Snakemake on a single, big machine then you can tell it with --cores how many threads it is allowed to utilize at any time. It will take care of launching as many jobs as possible so the cores get utilized.

I hope this answers your question. Maybe I should rename max_threads to threads_per_process or something. What do you think?

JonEilers commented 3 years ago

Hmm, sounds like a good idea. Maybe add a sentence in the readme file about using <=32 cores? Have another error :D

Error in rule mask_tandem:
    jobid: 14
    output: /home/jon/Working_Files/dentist/.Ajap_genome.tan.anno, /home/jon/Working_Files/dentist/.Ajap_genome.tan.data
    log: /home/jon/Working_Files/dentist/logs/mask-tandem.Ajap_genome.log (check log file(s) for error message)
    shell:
        Catrack -v / tan &> /home/jon/Working_Files/dentist/logs/mask-tandem.Ajap_genome.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

log file has Catrack: Cannot open /.db for 'r'

a-ludi commented 3 years ago

I am have no idea what happened there. Looks a bit like a bug in Snakemake. Have you tried simply starting the workflow once more?

JonEilers commented 3 years ago

You called it correctly. I cleaned the directory out and restarted Snakemake and it worked. At least for awhile.

Error in rule process:
    jobid: 1238
    output: /home/jon/Working_Files/dentist/insertions/batch.79.db
    log: /home/jon/Working_Files/dentist/logs/process.79.log (check log file(s) for error message)
    shell:
        dentist process --config=dentist.json  --threads=4 --auxiliary-threads=6 --mask=dentist-self-H,tan-H,dentist-reads-H --batch=3950..4000 /home/jon/Working_Files/dentist/Ajap_genome.dam /home/jon/Working_Files/dentist/SRR6282347.dam /home/jon/Working_Files/dentist/pile-ups.db /home/jon/Working_Files/dentist/insertions/batch.79.db 2> /home/jon/Working_Files/dentist/logs/process.79.log

Log file contents

Error: darg.ArgParseError@darg/source/darg.d(1281): Expected a value for positional argument '<out:insertions>'
----------------
??:? pure dentist.commandline.OptionsFor!(11).OptionsFor darg.parseArgs!(dentist.commandline.OptionsFor!(11).OptionsFor).parseArgs(const(immutable(char)[][]), darg.Config) [0x55b9154c4f66]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(11).runCommand(in immutable(char)[][]) [0x55b9154ba276]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x55b915449b41]
??:? _Dmain [0x55b9152d8704]
a-ludi commented 3 years ago

Again I would suggest to just rerun snakemake – do not clean up the directory. Snakemake keeps track of what is left to be done.

JonEilers commented 3 years ago

Gotcha, I reran snakemake without cleaning up the directory and it gives the same error message and the log contains the same info, just a different jobid/batch

a-ludi commented 3 years ago

I am a bit puzzled because the "missing argument" is actually present as far as I can tell.

Can you tell me which version of DENTIST you are using? Please verify with one the commands below:

# if you are using pre-compiled binaries:
./bin/dentist --version

# if you are using singularity:
singularity run docker://aludi/dentist:stable dentist --version

Excepted output for Singularity:

INFO:    Using cached SIF image
dentist v1.0.2-1-gd85a86f (commit d85a86fda8da241b0de3d3b8d3b02cf9e3405302)

Copyright © 2018, Arne Ludwig <arne.ludwig@posteo.de>

Subject to the terms of the MIT license, as written in the included LICENSE file
JonEilers commented 3 years ago
(singularity) jon@jon-PowerEdge-R910:~/Working_Files/dentist$ singularity run docker://aludi/dentist:stable dentist --version
INFO:    Using cached SIF image
INFO:    Converting SIF file to temporary sandbox...
dentist v1.0.2-1-gd85a86f (commit d85a86fda8da241b0de3d3b8d3b02cf9e3405302)

Copyright © 2018, Arne Ludwig <arne.ludwig@posteo.de>

Subject to the terms of the MIT license, as written in the included LICENSE file
INFO:    Cleaning up image...

If it's useful to know below are the versions of singularity and snakemake that are installed in the conda environment. Here is a list of everything installed into the conda environment.

a-ludi commented 3 years ago

Thanks, but I still do not understand the problem. It should be working just fine. :laughing:

Could you try running the command manually with

singularity run docker://aludi/dentist:stable dentist process --config=dentist.json  --threads=4 --auxiliary-threads=6 --mask=dentist-self-H,tan-H,dentist-reads-H --batch=3950..4000 /home/jon/Working_Files/dentist/Ajap_genome.dam /home/jon/Working_Files/dentist/SRR6282347.dam /home/jon/Working_Files/dentist/pile-ups.db /home/jon/Working_Files/dentist/insertions/batch.79.db 2> /home/jon/Working_Files/dentist/logs/process.79.log
JonEilers commented 3 years ago

same results. This may be a silly question, but I was looking at the command and noticed that the process-pileups command wants 5 positional arguments and command provided by the pipeline only has four. it's missing the "ignored"? Would that cause this error?

singularity run docker://aludi/dentist:stable \
    dentist process \
        --config=dentist.json  \
        --threads=4 \
        --auxiliary-threads=6 \
        --mask=dentist-self-H,tan-H,dentist-reads-H \
        --batch=3950..4000 \
        /home/jon/Working_Files/dentist/Ajap_genome.dam \
        /home/jon/Working_Files/dentist/SRR6282347.dam \
        /home/jon/Working_Files/dentist/pile-ups.db \
        /home/jon/Working_Files/dentist/insertions/batch.25.db 2> /home/jon/Working_Files/dentist/logs/process.25.log

vs

INFO:    Using cached SIF image
INFO:    Converting SIF file to temporary sandbox...
Error: darg.ArgParseError@darg/source/darg.d(1281): Expected a value for positional argument '<out:insertions>'
----------------
??:? pure dentist.commandline.OptionsFor!(11).OptionsFor darg.parseArgs!(dentist.commandline.OptionsFor!(11).OptionsFor).parseArgs(const(immutable(char)[][]), darg.Config) [0x55a81770df66]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(11).runCommand(in immutable(char)[][]) [0x55a817703276]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x55a817692b41]
??:? _Dmain [0x55a817521704]

Usage: dentist process-pile-ups [--allow-single-reads]
                                [--auxiliary-threads=num-threads]
                                [--bad-fraction=<frac>]
                                [--batch=<idx-spec>[,<idx-spec>...]]
                                [--config=<config-json>]
                                [--daccord=<daccord-option>[,<daccord-option>...]]
                                [--daligner-consensus=<daligner-option>[,<daligner-option>...]]
                                [--daligner-reads-vs-reads=<daligner-option>[,<daligner-option>...]]
                                [--daligner-self=<daligner-option>...]
                                [--datander-ref=<datander-option>[,<datander-option>...]]
                                [--dust-reads=<dust-option>[,<dust-option>...]]
                                [--help] [--keep-temp]
                                [--mask=<name>[,<name>...]]
                                [--max-chain-gap=<bps>] [--max-indel=<bps>]
                                [--max-relative-overlap=<fraction>]
                                [--min-anchor-length=<uint>]
                                [--min-reads-per-pile-up=<ulong>]
                                [--min-relative-score=<fraction>]
                                [--min-score=<int>] [--only=<OnlyFlag>]
                                [--proper-alignment-allowance=num] [--quiet]
                                [--revert=<option>[,<option>...]]
                                [--threads=<uint>] [--tmpdir=<string>] [--usage]
                                [--verbose] <in:reference> <in:reads> <ignored>
                                <in:pile-ups> <out:insertions>
INFO:    Cleaning up image...
a-ludi commented 3 years ago

Well spotted! This means that the workflow (Snakefile) and DENTIST are somewhat out of sync. I would suggest to update everything to the latest version:

  1. Update your Snakefile to the latest version.
  2. Rename max_threads to threads_per_process in snakemake.yml (see 546bdbff50c0db7c78f027e64934b224baae9794).
  3. Make sure that DENTIST v2.0.0 is being used by specifying dentist_container: "docker://aludi/dentist:v2.0.0" in snakemake.yml.

Then retry by launching snakemake once more.

JonEilers commented 3 years ago

I am guessing the below error has to do with limits on downloading from docker and not with dentist itself?

InputFunctionException in line 1776 of /home/jon/Working_Files/dentist/Snakefile:
Error:
  Exception: failed to get alignment commands: FATAL:   Unable to handle docker://aludi/dentist:v2.0.0 uri: failed to get checksum for docker://aludi/dentist:v2.0.0: Error reading manifest v2.0.0 in docker.io/aludi/dentist: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Wildcards:
  block_reads=26
Traceback:
  File "/home/jon/Working_Files/dentist/Snakefile", line 243, in secondary_expand
  File "/home/jon/Working_Files/dentist/Snakefile", line 1793, in <lambda>
  File "/home/jon/Working_Files/dentist/Snakefile", line 421, in generate_options_for