dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
160 stars 11 forks source link

`setdist` doesn't output distances #13

Closed olgabot closed 5 years ago

olgabot commented 5 years ago

Hello, When running setdist on these four fastq.gz files, the program calculates the absolute sizes but doesn't show the distances as it did with dashing dist

Here's the dashing dist output:

oot@f5b123e1af02:/data/catted_reads# time dashing dist *.fastq.gz
#Path    Size (est.)
A1-MAA000487-3_10_M-1-1.fastq.gz    25503239.355288
A1-B002764-3_38_F-1-1.fastq.gz    6140027.666877
A1-D042253-3_9_M-1-1.fastq.gz    12316236.863717
A1-MAA000779-3_11_M-1-1.fastq.gz    6934912.683602
##Names     A1-MAA000487-3_10_M-1-1.fastq.gz    A1-B002764-3_38_F-1-1.fastq.gz    A1-D042253-3_9_M-1-1.fastq.gz    A1-MAA000779-3_11_M-1-1.fastq.gz
A1-MAA000487-3_10_M-1-1.fastq.gz    -    0.049051    0.074235    0.000000
A1-B002764-3_38_F-1-1.fastq.gz    -    -    0.039803    0.055432
A1-D042253-3_9_M-1-1.fastq.gz    -    -    -    0.014889
A1-MAA000779-3_11_M-1-1.fastq.gz    -    -    -    -

real    0m13.896s
user    0m13.630s
sys    0m0.220s

Here are the reads:

(base) 
 Wed 23 Jan - 14:40  ~/data/catted_reads 
  aws s3 ls s3://olgabot-maca/dashing-test/                                                            
2019-01-23 09:24:57   73938115 A1-B002764-3_38_F-1-1.fastq.gz
2019-01-23 09:24:57   52176348 A1-D042253-3_9_M-1-1.fastq.gz
2019-01-23 09:24:57  168623288 A1-MAA000487-3_10_M-1-1.fastq.gz
2019-01-23 09:24:57   51031125 A1-MAA000779-3_11_M-1-1.fastq.gz

And here's the dashing setdist output:

$ /home/olga/code/dashing/dashing setdist *.fastq.gz
#Path   Size (est.)
A1-B002764-3_38_F-1-1.fastq.gz  6122553
A1-D042253-3_9_M-1-1.fastq.gz   12569210
A1-MAA000487-3_10_M-1-1.fastq.gz    26727514
A1-MAA000779-3_11_M-1-1.fastq.gz    7587041
##Names     A1-B002764-3_38_F-1-1.fastq.gz  A1-D042253-3_9_M-1-1.fastq.gz   A1-MAA000487-3_10_M-1-1.fastq.gz    A1-MAA000779-3_11_M-1-1.fastq.gz

Do you know what may be happening? Thank you! Warmest, Olga

dnbaker commented 5 years ago

Good find!

My suspicions were correct; I wasn't flushing the final output buffer for setdist. This has been corrected as of https://github.com/dnbaker/dashing/commit/36be9acda6ed62cf85a1fecf783219f259342faa.

Can you confirm that this is fixed for you?

olgabot commented 5 years ago

Yes, it's working now, thanks!

dnbaker commented 5 years ago

Fantastic. Thank you for the find!