Open jianshu93 opened 2 years ago
Hello Daniel,
time dashing2 sketch -k 11 -S 12000 --threads 24 --pminhash --topk 250 --cmpout phage_GPD_topK_250.txt -Q name.txt -F reference.txt
With and without the --topk 250 option, I have exactly the same output. Am I making a mistake? I am using the newest v2.1.11 for 512bw. Forgive me the sketch command read me is a little bit long/confusing.
Thanks,
Jianshu
Hi Jianshu,
Sorry for the wait! It's been a busy couple of weeks.
I've added in cmp usage (https://github.com/dnbaker/dashing2/commit/3b71c9cdb925aa582921e89a7cc66f62f773f9d4), so thank you for pointing that out.
You can pass sketched files to cmp, but you have to sketch all the original input files together. For example, something like this:
dashing2 sketch <sketching options> -F filelist.txt -o stacked_sketch_file.
dashing2 cmp <sketching options> --presketched stacked_sketch_file.rc_canon.sketchsize1024.k32.SetSpace.DNA.opss
This way, it's broken into two stages (which you can time). This also makes it easier to work with larger collections, since the sketch matrix can be memory-mapped and therefore exceed system RAM.
Does that help?
I'll look into the topk 250 option results as well. I'm not sure what's going on there, but I'll let you know.
Thanks,
Daniel
Hello Daniel,
I compute sketches using dashing2 sketch and store sketch using - - outfile. There is a sketch and a sketch.name.txt. But I am not sure how to feed those to dashing2 cmp since no help is provided. I looked into the code and it confuses me. I can use -cmpout to have the distance but I want to check how long sketching and cmp take.
Thanks,
Jianshu