aidenlab / straw

Extract data quickly from Juicebox via straw
MIT License
61 stars 36 forks source link

straw and strawr don't dump same values #99

Open ArielPaulson opened 2 years ago

ArielPaulson commented 2 years ago

So I am dumping obs/exp data from a hic file with command-line 'straw' versus R library 'strawr', and I am not getting the same results.

The data are very similar overall and correlate highly, but still are clearly not the same values, upwards of 80% of non-NA rows are different at 4 decimals of accuracy. This holds true across normalizations, bin sizes, and chromosomes, even unnormalized data (i.e. NONE oe) has this problem.

I am using the 'straw' compiled from the latest github release, and 'strawr' installed fresh just a few days ago on R-4.1.0 via install.packages().

I also compared the data from juicer tools 'dump' and found that it was basically identical to strawr.

Here is a row slice from a table showing both methods, same hic file, chr 1, VC, oe, 10kb:

PosA PosB dump strawr straw 40000 40000 0.463189 0.463189 0.463189 40000 45000 1.971135 1.971135 1.971135 40000 50000 2.149339 2.149339 2.149339 40000 55000 1.261088 1.261088 1.261088 40000 60000 0.776958 0.776958 0.624063 40000 65000 0.687151 0.687151 0.855503 40000 70000 0.394186 0.394187 0.333246 40000 80000 1.384906 1.384906 0.854544 40000 105000 1.731343 1.731343 1.358210 40000 110000 1.961904 1.961904 1.652741 40000 115000 0.312818 0.312818 0.240716 40000 120000 0.190295 0.190295 0.488769 40000 130000 0.333526 0.333526 0.338950 40000 135000 1.289947 1.289947 0.944601 40000 140000 0.450147 0.450147 0.437852 40000 145000 1.116514 1.116514 1.116514 40000 150000 0.638958 0.638958 0.459245 40000 165000 0.737243 0.737243 0.832634 40000 175000 0.632508 0.632508 0.678903 40000 190000 1.396106 1.396106 1.578750

Not sure how to proceed.

Thanks, Ariel

sa501428 commented 2 years ago

Straw is now using an improved expected model, specifically it applies a rolling median on the previous expected vector. This leads to reduction in the noise and more reliable O/E values. strawR has not yet been updated to use this expected model.