cschin / Peregrine

Peregrine: Fast Genome Assembler Using SHIMMER Index
Other
99 stars 9 forks source link

slow: containerized Peregrine on gpfs file system #36

Open ptrebert opened 4 years ago

ptrebert commented 4 years ago

Hi, no real bug report, just an info to potentially help others who run into similar problems: running the containerized version of Peregrine (tested with v0.1.6.1) with Singularity on infrastructure with a GPFS file system results in incredibly low performance. Similar observations have been made for other software as well, and were traced back to the use of mmap in these cases (see, e.g., here: spectrumscale.org/pipermail/gpfsug-discuss/2018-April/004908.html ).

The IT support for the above system set up an alternative file system for testing (Lustre), where the same Peregrine job finished successfully:

input: ~30x human PacBio Sequel-2 HiFi Cores: 72 Memory: ~900 GB Runtime: ~1.5 hrs :open_mouth:

Best, Peter

cschin commented 4 years ago

@ptrebert Thanks for reporting. Yes. The design is indeed focused on using a system that supports efficient mmap calls. Some file system does not support, in such case, I am wondering if a ramdisk can be a workaround by copy all data to ramdisk first.