Closed rderelle closed 1 day ago
This error should cause a crash, but might fail to do some on some sketchlib versions. Which version of sketchlib do you have?
Anyway, the issue is that one/both of SAMEA5875845 and/or SAMN03253058 need to be removed as they share no k-mers.
Also for Mtb you may want to increase the sketch size to 10^5
Thanks! I will then check these 2 samples. For information, I'm using pp-sketchlib v2.1.4. Also I'll increase the sketch size.
I'm using pp-sketchlib v2.1.4.
That should be fine, but stopping the parallel code hasn't always been particularly reliable sorry!
I would also suggest doing an initial test set of ~10k to get it working, and estimate how long the full analysis will take
I increased the sketch size to 100000 and removed 6 genomes not classified as Mtb by any other method (including SAMEA5875845).
sketching took 50 mn -> 81G file. distances is taking about 10 mn per 1% -> estimated computational time of 16h, which is fine.
Thanks a lot.
The distance calculations have successfully finished. Thanks.
Hi John,
I'm trying here to use popPUNK to classify 125k Mtb genomes.
Versions poppunk 2.6.5 installed with Conda
Command used and output returned sketchlib sketch -l list_files_poppunk.txt -o poppunk2 -s 10000 -k 17,29,4 --cpus 12 sketchlib query dist poppunk2 -o dist2 --cpus 16
Describe the bug The first command worked well and created the file "poppunk2.h5". However the 2nd command seems to be to hanging forever without creating any output file (I tried twice with different numbers of CPUs). Here is the shell output:
Calculating distances using 16 thread(s) Progress (CPU): 3.3% Progress (CPU): 6.7% Progress (CPU): 10.1% Progress (CPU): 13.3% Progress (CPU): 16.6% Progress (CPU): 19.8% Progress (CPU): 23.1% Progress (CPU): 26.3% Progress (CPU): 29.6% Progress (CPU): 32.8% Progress (CPU): 36.1% Progress (CPU): 39.3% Progress (CPU): 42.6% Progress (CPU): 45.8% Progress (CPU): 49.1% Progress (CPU): 52.3% Progress (CPU): 55.6% Progress (CPU): 58.8% Progress (CPU): 60.4%No non-zero Jaccard distances Fitting k-mer gradient failed, for:SAMEA5875845vs.SAMN03253058 0.00400641 0.000300481 0.000200321 0.000200321
Check for low quality genomes Progress (CPU): 63.6% Progress (CPU): 66.9% Progress (CPU): 70.1% Progress (CPU): 73.4% Progress (CPU): 76.6% Progress (CPU): 79.9% Progress (CPU): 83.1% Progress (CPU): 86.4% Progress (CPU): 89.6% Progress (CPU): 92.9% Progress (CPU): 96.1% Progress (CPU): 99.4% Progress (CPU): 100.0%
After that the job hangs for hours without output. Any help would be much appreciated as I'm currently stuck with this issue.
Many thanks, Romain