For whatever reason, when initially implementing long read support using Minimap2, I was unable to demonstrate significantly reduced execution time versus recreating the index from scratch every time hostile clean is called. Using a prebuilt index was only marginally quicker and frankly not worth the complexity of managing indexes. However, recently I tested whether this is still the case and observed that running hostile clean on a small long read fastq drops from taking ~45s to ~7s through use of a precomputed index.
This behaviour should first be characterised / verified on Linux and MacOS. Assuming the performance benefits are replicated on both OSs, adding invisible (but suitably logged) index caching and reuse should be done unless a good reason not to do so becomes apparent.
This will dramatically reduce execution time for processing many long read samples where this redundant indexing overhead is painful.
For whatever reason, when initially implementing long read support using Minimap2, I was unable to demonstrate significantly reduced execution time versus recreating the index from scratch every time
hostile clean
is called. Using a prebuilt index was only marginally quicker and frankly not worth the complexity of managing indexes. However, recently I tested whether this is still the case and observed that runninghostile clean
on a small long read fastq drops from taking ~45s to ~7s through use of a precomputed index.This behaviour should first be characterised / verified on Linux and MacOS. Assuming the performance benefits are replicated on both OSs, adding invisible (but suitably logged) index caching and reuse should be done unless a good reason not to do so becomes apparent.
This will dramatically reduce execution time for processing many long read samples where this redundant indexing overhead is painful.