Open mmokrejs opened 3 years ago
Will you please attach the complete spades.log file?
Hi Anton, here they are:
Well, this looks like an NFS issue as at the time of error all files are supposed to be already written. Its not like "SPAdes checks the presence early" – SPAdes opens the file for reading and expects it to exist, since it already written to it and closed.
Maybe it's worth to check the NFS mounting option - maybe the share is mounted async?
You are right I did not formulate the sentence properly. Some kind of delay/sync would be helpful. ;-)
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=$ipnum,local_lock=none
Could the overflown number show the file actually existed but it's presence is mis-interpreted?
Could the overflown number show the file actually existed but it's presence is mis-interpreted?
It's just -1 which is used to signal an error
Could the file be placed into --tpmdir
, actually the whole .bin-reads/
directory? Showing size of the directories created would help me to plan better size of the fast local storade for next execution too.
Well, the temp dir is transient and the output dir is used to store important intermediate files. Per NFS specification, the sync is done at the file close (https://linux.die.net/man/5/nfs), so no additional manual sync is required. So, likely you need either to move out of NFS or try e.g. sync
mount option as a last-resort.
I doubt the admins would remount the drives for me or upgrade the 3.10.0-1160.15.2.el7.x86_64
kernel for me. And the local SSD storage is too small. I will try a different host in the cluster. But thank you for your tips.
I spent quite some time restarting the jobs, moving them to SSD scratch space, no way. I suspect it is really something about the 3.15.1 binaries, I recompiled from sources and now I could at least start loading-in the FASTQ data again (was getting OS error -11 as others already reported). Let's see, if the analysis moves anywhere behing the past exit point reported initially in this thread. Fingers crossed.
Will SPAdes support more than 9 input libraries in some future? I have 16. It is weird they are merged into a single-one as "library 5", at least if they were all squashed under the last one, "library 9". It would be easier to spot that.
SPAdes supports arbitrary number of input libraries. See https://cab.spbu.ru/files/release3.15.0/manual.html#sec3.1 (section about YAML dataset specification) for more results.
I recompiled from sources and now I could at least start loading-in the FASTQ data again (was getting OS error -11 as others already reported). Let's see, if the analysis moves anywhere behing the past exit point reported initially in this thread. Fingers crossed.
Well, given that this is a race condition, it might happen that some changes in the code layout might introduce arbitrary delays here and there, so the NFS server will start to show the file. No guarantees that it will not start to fail anytime in the future :)
As you can see rnaspades.py stitched multiple input datafiles into a single entry in the YAML file, that is what I had in mind:
- "interlaced reads":
- "/foo/interleaved2/CTGAAG.stem_IAA_4h.trimmomatic.fastq"
"number": !!int "7"
"orientation": "fr"
"single reads":
- "/foo/interleaved2/CTGAAG.stem_IAA_4h.trimmomatic.singletons.fastq"
"type": "paired-end"
- "interlaced reads":
- "/foo/interleaved2/ATTACT.stem_MeJA_4h.trimmomatic.fastq"
- "/foo/interleaved2/GAATTC.stem_ABA_30min.trimmomatic.fastq"
- "/foo/interleaved2/GAGATT.latex_ABA_5min.trimmomatic.fastq"
- "/foo/interleaved2/GAGATT.stem_SA_30min.trimmomatic.fastq"
- "/foo/interleaved2/TAATGC.stem_SA_4h.trimmomatic.fastq"
- "/foo/interleaved2/TAATGC.stem_SA_5min.trimmomatic.fastq"
- "/foo/interleaved2/TCCGGA.stem_ABA_4h.trimmomatic.fastq"
- "/foo/interleaved2/TCCGGA.stem_ABA_5min.trimmomatic.fastq"
"number": !!int "1"
"orientation": "fr"
"single reads":
- "/foo/interleaved2/ATTACT.stem_MeJA_4h.trimmomatic.singletons.fastq"
- "/foo/interleaved2/GAATTC.stem_ABA_30min.trimmomatic.singletons.fastq"
- "/foo/interleaved2/GAGATT.latex_ABA_5min.trimmomatic.singletons.fastq"
- "/foo/interleaved2/GAGATT.stem_SA_30min.trimmomatic.singletons.fastq"
- "/foo/interleaved2/TAATGC.stem_SA_4h.trimmomatic.singletons.fastq"
- "/foo/interleaved2/TAATGC.stem_SA_5min.trimmomatic.singletons.fastq"
- "/foo/interleaved2/TCCGGA.stem_ABA_4h.trimmomatic.singletons.fastq"
- "/foo/interleaved2/TCCGGA.stem_ABA_5min.trimmomatic.singletons.fastq"
"type": "paired-end"
- "interlaced reads":
- "/foo/interleaved2/ATTACT.stem_MeJA_5min.trimmomatic.fastq"
"number": !!int "2"
"orientation": "fr"
"single reads":
- "/foo/interleaved2/ATTACT.stem_MeJA_5min.trimmomatic.singletons.fastq"
"type": "paired-end"
But this is a single library consisting many files. Not multiple libraries – quite a huge difference
I know, and that was reason I raised this up. Have a look to ensure that I provided that as separate input libraries at the cmdline to python wrapper. See above the https://github.com/ablab/spades/files/6091134/run_rnaspades.txt
Hi, I have problems with rnaspades-3.15.1 (your 64-bit binaries) running on NFSv4 filesystem:
Hmm, the file does not exist, actually.
I also observed issues that a file was evidently written too late to the filesystem and spades checked for its presence too early:
Is there anything I could do about this? I tried --tmp-dir but these files are to be placed in the output project's directory, on the tmpfs. Thank you.