Closed avilella closed 1 year ago
Hi,
In the specific case of bigseqkit stats, it's a task that doesn't require much computational time, so parallelizing it within a single machine doesn't make much sense. The execution time will be primarily limited by the machine's disk read speed. To reduce this time, you can consider using a computing cluster where multiple machines can read the file in parallel.
Best regards.
Being more specific, "stats" doesn't perform a large number of operations per sequence and, therefore, it is limited by the memory/disk bus width. Once the number of threads being used fills the bus, adding new threads will not increase performance. This is not the case if we parallelize "stats" using multiple nodes, as the cumulative bandwidth of several nodes will be higher.
Hi, would bigseqkit improve the speed of
seqkit stats
in single machine with 20-32 threads? Thx