Illumina / strelka

Strelka2 germline and somatic small variant caller
GNU General Public License v3.0
356 stars 103 forks source link

strelka2 very slow and trashing disk on ext4 #89

Open derijkp opened 5 years ago

derijkp commented 5 years ago

Running strelka2 on my ext4 file system leads to disk trashing and slowness. The excessive disk access is caused by the ext4 journalling process. This type of problems has been seen before with programs calling fsync many times. I tested by (hardcoded) removing the fsync calls in strelka, and the disk trashing indeed stops. I suppose that the fsyncs are used to provide consistency of the "so far" data in case of a crash, but having to wipe and restart a crashed analysis completely (which happens seldom) is a better option than constant disk trashing and slowness. A relatively easy solution could thus be to make fsyncing an option (so it can be turned of on filesystems that do not deal well with it).

Regards,

Peter

ctsa commented 5 years ago

Thanks Peter, The fsync is used by pyflow to keep its logs up to date in the event of an error. I was not aware this could cause such significant complications. I will add the disable option as an improvement item to for the pyflow API.

amizeranschi commented 5 years ago

I think I'm facing a related problem. I'm running Strelka2 through bcbio_nextgen on an NFS file system and it seems to run significantly slower compared to other tools. After canceling the run and attempting to delete the working directory, it takes a lot of time to remove files such as these:

deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/586/taskWrapperParameters.pickle
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/586/pyflowTaskWrapper.signal.txt
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/586/
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/584/taskWrapperParameters.pickle
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/584/pyflowTaskWrapper.signal.txt
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/584/
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/582/taskWrapperParameters.pickle
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/582/pyflowTaskWrapper.signal.txt
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/582/
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/580/taskWrapperParameters.pickle
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/580/pyflowTaskWrapper.signal.txt
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/580/

@ctsa I'm guessing that writing these files to NFS is what's causing Strelka2 to run so slow in my case. Is there a way to avoid creating these files?

abenjak commented 5 years ago

In case this helps: (tumorBAM=11G, normalBAM=29G, using -j 6, on a i7-8750H CPU) I ran strelka-2.9.10 on my internal 1TB ext4 disk. It finished in 2 h. I ran the same job on my tmpfs partition, it finished in 13 min.

amizeranschi commented 5 years ago

Thanks for the comment. It could be worth trying to run things inside a tmpfs partition, if the temporary data could fit in there. How much RAM do you have available and how large is the tmpfs partition that you used?

abenjak commented 5 years ago

Sorry, it turns out I did not run it on tmpfs, but on an SDD drive (my bad, I forgot to mount the /tmp as tmpfs on my new laptop, which I normally do).

I re-run it now on a 20GB tmpfs and it needed 18 min (slower than on the SDD? I did not check the partition usage, is it possible that 20GB was not enough and it went swapping?).

In either case, running Strelka2 on an SDD or tmpfs is much faster than on HDD. Is there an option to define a temporary directory? This would be very practical because I wouldn't need to configure the run on unusual locations and then moving the results to my actual working directory.

Cheers, Andrej

amizeranschi commented 5 years ago

I was running Strelka2 through Bcbio-nextgen, which does offer a way to set the TMP location: https://bcbio-nextgen.readthedocs.io/en/latest/contents/configuration.html#temporary-directory.

serge2016 commented 4 years ago

I have the same symptoms. Any progress here? It there any other solution?

skchronicles commented 3 years ago

@ctsa Hey Chris, are there any updates to this issue? I am also experiencing a similar issue were pyflow is generating hundreds of thousand log files.

Here is a small snippet: image

I am not sure on the exact number of log files it has generated, but it appears to be pretty significant. I have another find command that has been running for over 30 mins.

haochenz96 commented 2 years ago

@ctsa I have the same issue! Is there any update here?

skchronicles commented 2 years ago

Hey @ctsa, I am just checking in to see if you have time to look into this issue, or if you can pass it along to another @Illumina team member.

Thank you for your time.

Best Regards, @skchronicles

amizeranschi commented 2 years ago

@skchronicles I think it's quite safe to assume that this software has been abandoned for a while now. Same for Manta, abandoned in July 2019. In general, Illumina now seem to be putting all their efforts into Dragen.

Is there any reason why you'd want to use strealka2 so badly, instead of other variant callers such as those from GATK?