dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
161 stars 11 forks source link

Sketch from STDIN #65

Open mihkelvaher opened 3 years ago

mihkelvaher commented 3 years ago

Is it possible to create a sketch using fastas streamed through a pipe to dashing?

I'm manipulating both assembled genomes and k-mers and would like to compare them in the end multiple times. I could write them on the disk as an additional step but given the high volume, it is really cumbersome.

Thanks.

dnbaker commented 3 years ago

Hi!

Sure, that's something that can be done. I'm in the process of adding an option -o to sketch that can take '/dev/stdout' or '-', which I'll probably finish later today. (The option is there, but it is currently broken.)

Thanks,

Daniel

mihkelvaher commented 3 years ago

That's great! Also, can smaller sketches be compared with larger ones? For example, if I have some bacteria and human in the same database, do I need to scale up the bacterial sketches if I don't want to lose too much human info?

dnbaker commented 3 years ago

Hi Mihkel --

They can't be compared directly, but you can compress larger sketches by folding them in half repeatedly, if you will. I've added a new subcommand dashing fold which should compress a larger HLL into a smaller sketch.

The above issue about directing sketch output to a stream has been fixed, and once the new binaries finish compiling they should be ready to use. I'll let you know when it's avaiable.

Thanks,

Daniel

mihkelvaher commented 3 years ago

Hi Daniel,

The pull request suggested that the option is now available on main.

Unfortunately, I don't know how to use it and -o isn't listed in dashing sketch -h. What I tried (and similar): cat test.fasta | dashing sketch - -o test.sketch

Sketching the old way, I gave fold also a try but [src/main.cpp:int main(int, char**)56] Invalid subcommand fold provided.

I'm guessing I've got a wrong version (v0.5-9-ge6ae), but the pull request was also merged to main/master...

Thanks.