OndrejSladky / kmercamel

KmerCamel🐫 provides implementations of several algorithms for efficiently representing a set of k-mers as a masked superstring.
MIT License
12 stars 2 forks source link

Potentially slow IO (correction: not an issue of kmercamel) #85

Closed karel-brinda closed 3 hours ago

karel-brinda commented 4 hours ago

I've just measured the speed of writing of kmercamel combined with gzip and got something like 300KB/sec (after gzip). That's extremely low even for gzip --best.

Would it be possible to: 1) Verify that you can reproduce the same 2) If yes, figure out why this is so slow?

Sometimes, this type of slow output associated with the slow C++ STL.

Alternatively, this might be Slurm-related issue. I used only 1 thread. Should be double checked independently anyway.

karel-brinda commented 3 hours ago

Here's the speed from writing output after mask opt (i.e., 0.3 MB/s after gzip):

$ while true; do wc -c  ms.k31_ones.fa.gz; sleep 1; done
6799360 ms.k31_ones.fa.gz
7094272 ms.k31_ones.fa.gz
7323648 ms.k31_ones.fa.gz
7618560 ms.k31_ones.fa.gz
7929856 ms.k31_ones.fa.gz
8224768 ms.k31_ones.fa.gz
8454144 ms.k31_ones.fa.gz
8749056 ms.k31_ones.fa.gz
9043968 ms.k31_ones.fa.gz
9289728 ms.k31_ones.fa.gz
9584640 ms.k31_ones.fa.gz
9879552 ms.k31_ones.fa.gz
10108928 ms.k31_ones.fa.gz
10403840 ms.k31_ones.fa.gz
10649600 ms.k31_ones.fa.gz
10944512 ms.k31_ones.fa.gz
11239424 ms.k31_ones.fa.gz
11534336 ms.k31_ones.fa.gz
11763712 ms.k31_ones.fa.gz
12058624 ms.k31_ones.fa.gz
12304384 ms.k31_ones.fa.gz
12599296 ms.k31_ones.fa.gz
12894208 ms.k31_ones.fa.gz
13189120 ms.k31_ones.fa.gz
13434880 ms.k31_ones.fa.gz
13664256 ms.k31_ones.fa.gz
13959168 ms.k31_ones.fa.gz
14204928 ms.k31_ones.fa.gz
14499840 ms.k31_ones.fa.gz
14729216 ms.k31_ones.fa.gz
15024128 ms.k31_ones.fa.gz
15269888 ms.k31_ones.fa.gz
15499264 ms.k31_ones.fa.gz
15794176 ms.k31_ones.fa.gz
16089088 ms.k31_ones.fa.gz
16334848 ms.k31_ones.fa.gz
16629760 ms.k31_ones.fa.gz
16924672 ms.k31_ones.fa.gz
karel-brinda commented 3 hours ago

I've done test just with gzip, and indeed the bottleneck is the gzip (it's exactly the same speed):

$ while true; do wc -c  test_speed_just_compression.tmp; sleep 1; done
2359296 test_speed_just_compression.tmp
2654208 test_speed_just_compression.tmp
2949120 test_speed_just_compression.tmp
3260416 test_speed_just_compression.tmp
3555328 test_speed_just_compression.tmp
3850240 test_speed_just_compression.tmp
4145152 test_speed_just_compression.tmp
4374528 test_speed_just_compression.tmp
4669440 test_speed_just_compression.tmp
4980736 test_speed_just_compression.tmp
5275648 test_speed_just_compression.tmp
5570560 test_speed_just_compression.tmp
5865472 test_speed_just_compression.tmp
6160384 test_speed_just_compression.tmp
6455296 test_speed_just_compression.tmp
6750208 test_speed_just_compression.tmp
6995968 test_speed_just_compression.tmp
7290880 test_speed_just_compression.tmp
7585792 test_speed_just_compression.tmp
7815168 test_speed_just_compression.tmp
8110080 test_speed_just_compression.tmp
8404992 test_speed_just_compression.tmp
8699904 test_speed_just_compression.tmp
9011200 test_speed_just_compression.tmp
9306112 test_speed_just_compression.tmp
9601024 test_speed_just_compression.tmp
9895936 test_speed_just_compression.tmp
10125312 test_speed_just_compression.tmp
10420224 test_speed_just_compression.tmp
karel-brinda commented 3 hours ago

Going to close this as an invalid ticket.