ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
763 stars 138 forks source link

3-15.1: .bin_reads/single_0.off: Unknown error 18446744073709551615 #675

Open mmokrejs opened 3 years ago

mmokrejs commented 3 years ago

Hi, I have problems with rnaspades-3.15.1 (your 64-bit binaries) running on NFSv4 filesystem:

  0:00:00.094     1M / 11M   INFO    General                 (memory_limit.cpp          :  48)   Memory limit set to 2000 Gb
  0:00:00.106     1M / 11M   INFO    General                 (main.cpp                  : 107)   Starting SPAdes, built from refs/heads/spades_3.15.1, git revision b4c79ba4390bf4cff81c01c99c1d438541bff031
  0:00:00.122     1M / 11M   INFO    General                 (main.cpp                  : 108)   Maximum k-mer length: 128
  0:00:00.136     1M / 11M   INFO    General                 (main.cpp                  : 109)   Assembling dataset (/my_path/my_transcriptome_rnaspades/dataset.info) with K=49
  0:00:00.144     1M / 11M   INFO    General                 (main.cpp                  : 110)   Maximum # of threads to use (adjusted due to OMP capabilities): 200
  0:00:00.156     1M / 11M   INFO    General                 (pipeline.cpp              : 212)   SPAdes started
  0:00:00.160     1M / 11M   INFO    General                 (pipeline.cpp              : 225)   Starting from stage: last
  0:00:00.172     1M / 11M   INFO    General                 (pipeline.cpp              : 231)   Two-step repeat resolution disabled
  0:00:00.177     1M / 11M   INFO   GraphCore                (graph_core.hpp            : 672)   Graph created, vertex min_id: 3, edge min_id: 3
  0:00:00.189     1M / 11M   INFO   GraphCore                (graph_core.hpp            : 673)   Vertex size: 48, edge size: 40
  0:00:00.194     1M / 11M   INFO    General                 (edge_index.hpp            : 115)   Size of edge index entries: 12/8
  0:00:00.227     1M / 11M   INFO    General                 (read_conversion.cpp       :  21)   Loading current state from /my_path/my_transcriptome_rnaspades/K49/saves/read_conversion
  0:00:00.238     1M / 11M   INFO   StageManager             (stage.cpp                 : 185)   STAGE == de Bruijn graph construction (id: construction)
Exception caught Cannot tell the size of file /my_path/my_transcriptome_rnaspades/.bin_reads/single_0.off: Unknown error 18446744073709551615

Hmm, the file does not exist, actually.

I also observed issues that a file was evidently written too late to the filesystem and spades checked for its presence too early:

  0:16:32.909     1M / 3806M INFO    General                 (read_conversion.cpp       :  30)   Saving current state to /my_path/my_transcriptome_rnaspades/K49/saves/read_conversion
  0:16:33.404     1M / 3806M INFO   StageManager             (stage.cpp                 : 185)   STAGE == de Bruijn graph construction (id: construction)
Exception caught Cannot open file '/my_path/my_transcriptome_rnaspades/.bin_reads/paired_1.off'

Is there anything I could do about this? I tried --tmp-dir but these files are to be placed in the output project's directory, on the tmpfs. Thank you.

asl commented 3 years ago

Will you please attach the complete spades.log file?

mmokrejs commented 3 years ago

Hi Anton, here they are:

run_rnaspades.txt run_rnaspades-cont.txt

asl commented 3 years ago

Well, this looks like an NFS issue as at the time of error all files are supposed to be already written. Its not like "SPAdes checks the presence early" – SPAdes opens the file for reading and expects it to exist, since it already written to it and closed.

Maybe it's worth to check the NFS mounting option - maybe the share is mounted async?

mmokrejs commented 3 years ago

You are right I did not formulate the sentence properly. Some kind of delay/sync would be helpful. ;-)

rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=$ipnum,local_lock=none

mmokrejs commented 3 years ago

Could the overflown number show the file actually existed but it's presence is mis-interpreted?

asl commented 3 years ago

Could the overflown number show the file actually existed but it's presence is mis-interpreted?

It's just -1 which is used to signal an error

mmokrejs commented 3 years ago

Could the file be placed into --tpmdir, actually the whole .bin-reads/ directory? Showing size of the directories created would help me to plan better size of the fast local storade for next execution too.

asl commented 3 years ago

Well, the temp dir is transient and the output dir is used to store important intermediate files. Per NFS specification, the sync is done at the file close (https://linux.die.net/man/5/nfs), so no additional manual sync is required. So, likely you need either to move out of NFS or try e.g. sync mount option as a last-resort.

mmokrejs commented 3 years ago

I doubt the admins would remount the drives for me or upgrade the 3.10.0-1160.15.2.el7.x86_64 kernel for me. And the local SSD storage is too small. I will try a different host in the cluster. But thank you for your tips.

I spent quite some time restarting the jobs, moving them to SSD scratch space, no way. I suspect it is really something about the 3.15.1 binaries, I recompiled from sources and now I could at least start loading-in the FASTQ data again (was getting OS error -11 as others already reported). Let's see, if the analysis moves anywhere behing the past exit point reported initially in this thread. Fingers crossed.

mmokrejs commented 3 years ago

Will SPAdes support more than 9 input libraries in some future? I have 16. It is weird they are merged into a single-one as "library 5", at least if they were all squashed under the last one, "library 9". It would be easier to spot that.

asl commented 3 years ago

SPAdes supports arbitrary number of input libraries. See https://cab.spbu.ru/files/release3.15.0/manual.html#sec3.1 (section about YAML dataset specification) for more results.

asl commented 3 years ago

I recompiled from sources and now I could at least start loading-in the FASTQ data again (was getting OS error -11 as others already reported). Let's see, if the analysis moves anywhere behing the past exit point reported initially in this thread. Fingers crossed.

Well, given that this is a race condition, it might happen that some changes in the code layout might introduce arbitrary delays here and there, so the NFS server will start to show the file. No guarantees that it will not start to fail anytime in the future :)

mmokrejs commented 3 years ago

As you can see rnaspades.py stitched multiple input datafiles into a single entry in the YAML file, that is what I had in mind:

- "interlaced reads":
  - "/foo/interleaved2/CTGAAG.stem_IAA_4h.trimmomatic.fastq"
  "number": !!int "7"
  "orientation": "fr"
  "single reads":
  - "/foo/interleaved2/CTGAAG.stem_IAA_4h.trimmomatic.singletons.fastq"
  "type": "paired-end"
- "interlaced reads":
  - "/foo/interleaved2/ATTACT.stem_MeJA_4h.trimmomatic.fastq"
  - "/foo/interleaved2/GAATTC.stem_ABA_30min.trimmomatic.fastq"
  - "/foo/interleaved2/GAGATT.latex_ABA_5min.trimmomatic.fastq"
  - "/foo/interleaved2/GAGATT.stem_SA_30min.trimmomatic.fastq"
  - "/foo/interleaved2/TAATGC.stem_SA_4h.trimmomatic.fastq"
  - "/foo/interleaved2/TAATGC.stem_SA_5min.trimmomatic.fastq"
  - "/foo/interleaved2/TCCGGA.stem_ABA_4h.trimmomatic.fastq"
  - "/foo/interleaved2/TCCGGA.stem_ABA_5min.trimmomatic.fastq"
  "number": !!int "1"
  "orientation": "fr"
  "single reads":
  - "/foo/interleaved2/ATTACT.stem_MeJA_4h.trimmomatic.singletons.fastq"
  - "/foo/interleaved2/GAATTC.stem_ABA_30min.trimmomatic.singletons.fastq"
  - "/foo/interleaved2/GAGATT.latex_ABA_5min.trimmomatic.singletons.fastq"
  - "/foo/interleaved2/GAGATT.stem_SA_30min.trimmomatic.singletons.fastq"
  - "/foo/interleaved2/TAATGC.stem_SA_4h.trimmomatic.singletons.fastq"
  - "/foo/interleaved2/TAATGC.stem_SA_5min.trimmomatic.singletons.fastq"
  - "/foo/interleaved2/TCCGGA.stem_ABA_4h.trimmomatic.singletons.fastq"
  - "/foo/interleaved2/TCCGGA.stem_ABA_5min.trimmomatic.singletons.fastq"
  "type": "paired-end"
- "interlaced reads":
  - "/foo/interleaved2/ATTACT.stem_MeJA_5min.trimmomatic.fastq"
  "number": !!int "2"
  "orientation": "fr"
  "single reads":
  - "/foo/interleaved2/ATTACT.stem_MeJA_5min.trimmomatic.singletons.fastq"
  "type": "paired-end"
asl commented 3 years ago

But this is a single library consisting many files. Not multiple libraries – quite a huge difference

mmokrejs commented 3 years ago

I know, and that was reason I raised this up. Have a look to ensure that I provided that as separate input libraries at the cmdline to python wrapper. See above the https://github.com/ablab/spades/files/6091134/run_rnaspades.txt