a-ludi / dentist

Close assembly gaps using long-reads at high accuracy.
https://a-ludi.github.io/dentist/
MIT License
47 stars 6 forks source link

FATAL: Unable to handle docker://aludi/dentist:v1.0.1 uri #15

Closed marcelauliano closed 3 years ago

marcelauliano commented 3 years ago

Hey Ludi,

I hope you are ok. I work at the Sanger in the Darwin Tree of Life and Gene suggested me to try your tool to close a few assembly gaps. One I run the test on the command line it finishes ok. By the time I change to send it to lsf I get an error concerning the version of LAsort? Could you please have a look:

[Tue Apr 13 11:06:22 2021]
Error in rule mask_dust:
    jobid: 15
    output: workdir/.assembly-test.dust.anno, workdir/.assembly-test.dust.data
    shell:
        DBdust workdir/assembly-test
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 179164 logs/cluster/mask_dust/dam=assembly-test/jobid15_fba32544-ef79-47b9-a8e5-0b95ca02bd59.out
Error executing rule mask_dust on cluster (jobid: 15, external: 179164 logs/cluster/mask_dust/dam=assembly-test/jobid15_fba32544-ef79-47b9-a8e5-0b95ca02bd59.out, jobscript: /lustre/scratch116/vr/projects/vgp/user/mu2/dentist/dentist-example2/dentist-example/.snakemake/tmp.4lrb2f3l/snakejob.mask_dust.15.sh). For error details see the cluster log and the log files of the involved rule(s).
[Tue Apr 13 11:06:22 2021]
Error in rule tandem_alignment_block:
    jobid: 18
    output: workdir/TAN.assembly-test.1.las
    log: logs/tandem-alignment.assembly-test.1.log (check log file(s) for error message)
    shell:
            {
                cd workdir
                datander '-T8' -s126 -l500 -e0.7 assembly-test.1
                LAcheck -v assembly-test TAN.assembly-test.1.las || { echo 'Check failed. Possible solutions:
Duplicate LAs: can be fixed by LAsort from 2020-03-22 or later.
In order to ignore checks entirely you may use the environment variable SKIP_LACHECK=1. Use only if you are positive the files are in fact OK!'; (( ${SKIP_LACHECK:-0} != 0 )); }
            } &> logs/tandem-alignment.assembly-test.1.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 179165 logs/cluster/tandem_alignment_block/dam=assembly-test.block=1/jobid18_8a60e75f-99ad-4021-bb14-0559b3bd4dc0.out
Error executing rule tandem_alignment_block on cluster (jobid: 18, external: 179165 logs/cluster/tandem_alignment_block/dam=assembly-test.block=1/jobid18_8a60e75f-99ad-4021-bb14-0559b3bd4dc0.out, jobscript: /lustre/scratch116/vr/projects/vgp/user/mu2/dentist/dentist-example2/dentist-example/.snakemake/tmp.4lrb2f3l/snakejob.tandem_alignment_block.18.sh). For error details see the cluster log and the log files of the involved rule(s).

I try exporting the variable but got the same error. Could you help me? Thank you. Marcela.

a-ludi commented 3 years ago

Hi Marcela,

the log does not contain the actual error message but just a potential one (sorry for that, I really need to find a way of hiding this). Also I can see that both mask_dust and tandem_alignment_block failed. Could you please do these checks:

  1. Look into or just post here the contents of logs/tandem-alignment.assembly-test.1.log.
  2. Scan through the snakemake log above the portion you posted to find an error related to DBdust.
  3. I suspect I might be related to singularity not being able to fetch/access the container image. This would be easy to fix. Do the snakemake jobs have internet access?

Cheers, Arne

marcelauliano commented 3 years ago

humn...

Hey Arne, That specific log you asked for is empty, but I have other ones and my message is:

FATAL:   Unable to handle docker://aludi/dentist:v1.0.1 uri: failed to get checksum for docker://aludi/dentist:v1.0.1: error pinging docker registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 52.55.168.20:443: i/o timeout

I guess you are correct?

How do I fix it, something along these lines?

--gui | Serve an HTML based user interface to the given network and port e.g. 168.129.10.15:8000. By default Snakemake is only available in the local network (default port: 8000). To make Snakemake listen to all ip addresses add the special host address 0.0.0.0 to the url (0.0.0.0:8000). This is important if Snakemake is used in a virtualised environment like Docker. If possible, a browser window is opened.

Does that make sense? Thanks a lot for your help here!

a-ludi commented 3 years ago

Yes, my guess was correct. Please proceed as follows:

  1. Download container image:
    # change /path/to/dir to something that is accessible for all jobs
    singularity pull --dir /path/to/dir docker://aludi/dentist:v1.0.1
  2. Make use of the downloaded image by adjusting snakemake.yml: change/add
    dentist_container: "/path/to/dir/dentist_v1.0.1.sif"
  3. Now, rerun as usual. The network error should be gone.
marcelauliano commented 3 years ago

Yes, my guess was correct. Please proceed as follows:

  1. Download container image:
    # change /path/to/dir to something that is accessible for all jobs
    singularity pull --dir /path/to/dir docker://aludi/dentist:v1.0.1
  2. Make use of the downloaded image by adjusting snakemake.yml: change/add
    dentist_container: "/path/to/dir/dentist_v1.0.1.sif"
  3. Now, rerun as usual. The network error should be gone.

Danke Arne! The problem is solved!

muffato commented 3 years ago

I'd like this ticket to be reopened, please. The error is still there with Dentist 1.0.1

Error in rule ref_vs_reads_alignment_block:
    jobid: 977
    output: workdir/scaffolds_FINAL.non-hifi.1kb.128.las, workdir/non-hifi.1kb.128.scaffolds_FINAL.las
    log: logs/ref-vs-reads-alignment.128.log (check log file(s) for error message)
    shell:

            {
                cd workdir
                damapper -C '-T8' -e0.7 -mdust -mdentist-self -mtan scaffolds_FINAL non-hifi.1kb.128
                LAcheck -v scaffolds_FINAL non-hifi.1kb scaffolds_FINAL.non-hifi.1kb.128.las || { echo 'Check failed. Possible solutions:

Duplicate LAs: can be fixed by LAsort from 2020-03-22 or later.

In order to ignore checks entirely you may use the environment variable SKIP_LACHECK=1. Use only if you are positive the files are in fact OK!'; (( ${SKIP_LACHECK:-0} != 0 )); }
                LAcheck -v non-hifi.1kb scaffolds_FINAL non-hifi.1kb.128.scaffolds_FINAL.las || { echo 'Check failed. Possible solutions:

Duplicate LAs: can be fixed by LAsort from 2020-03-22 or later.

In order to ignore checks entirely you may use the environment variable SKIP_LACHECK=1. Use only if you are positive the files are in fact OK!'; (( ${SKIP_LACHECK:-0} != 0 )); }
            } &> logs/ref-vs-reads-alignment.128.log

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 208494 logs/cluster/ref_vs_reads_alignment_block/block_reads=128/jobid977_e17a6776-b2a4-4570-aba0-d97bd422ba29.out

Error executing rule ref_vs_reads_alignment_block on cluster (jobid: 977, external: 208494 logs/cluster/ref_vs_reads_alignment_block/block_reads=128/jobid977_e17a6776-b2a4-4570-aba0-d97bd422ba29.out, jobscript: /lustre/scratch116/tol/teams/team308/users/mm49/tmp/non-hifi-reads2/.snakemake/tmp.4pilq3ef/snakejob.ref_vs_reads_alignment_block.977.sh). For error details see the cluster log and the log files of the involved rule(s).

Snakemake retries the jobs a few times, but they keep on failing for the same reason, and at some point snakemake gives up and quits.

The image is v1.0.1:

$ singularity inspect dentist_v1.0.1.sif 
org.label-schema.build-arch: amd64
org.label-schema.build-date: Thursday_22_April_2021_11:19:9_UTC
org.label-schema.schema-version: 1.0
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: aludi/dentist:v1.0.1
org.label-schema.usage.singularity.version: 3.7.2
a-ludi commented 3 years ago

I think your issue is likely unrelated to fetching the Singularity image, so I moved it to a new issue #16 and keep this isseu closed.