dnbaker / dashing2

Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.
MIT License
62 stars 7 forks source link

how to get assymetric matrix for containment index? #47

Open dejsha opened 2 years ago

dejsha commented 2 years ago

Dear Daniel, (first thank you very much for new binaries which emit distances with --set /--countdict, it works perfectly.)

Could you please help me with another issue - the containment index? I cannot figure out the correct settings to emit the assymetric matrix for the containment index. Running this: ./dashing2_s128 sketch -k31 -p8 --cmpout containment.txt -o card -F paths.txt --set --containment --asymmetric-all-pairs I am getting full square matrix, but it is symmetric, the outputs are the same for the upper and the lower parts. They are indeed containment indexes, i.e. they fit to (A intersection B)/A but, the values for (A intersection B)/B are not reported. The matrix is identical to the one produced by: ./dashing2_s128 sketch -k31 -p8 --cmpout containment.txt -o card -F paths.txt -Q paths.txt --set --containment I tried also ./dashing2_s128 sketch -k31 -p8 --cmpout containment.txt -o card -F paths.txt -Q paths.txt --set --containment --asymmetric-all-pairs This results in two full symmetric matrices (each one is the same as those mentioned above) stick toghether, i.e. each F-Q sample pair comparison is done twice, but still in only "one direction".

Could you please help me with that? Thank you, dasa

dnbaker commented 2 years ago

Hi Dasa,

Thanks for the report - it's very helpful.

I've confirmed that the --asymmetric-all-pairs with -F paths.txt is computing only one of the two containment scores, so I'm looking into why that's happening.

Best,

Daniel

dnbaker commented 2 years ago

Hi again,

I found the problem - Dashing2 was selecting symmetric containment instead of standard containment. This was responsible for the distance matrices being symmetric when they shouldn't have been.

I've fixed them in this pull request, and I'll include updated binaries soon.

Edit: Binaries have been updated. Use Dashing v2.1.11 to have the fix included.

Thanks,

Daniel