Closed maxdudek closed 5 months ago
Hi Max,
This is a very thoughtful question and we have debated about this in our lab too! My current take is that I would avoid dividing by bias since sometimes you can have very low bias values like 0.01. When we call footprints, values like 0.01 and 0.005 will give you similar footprint results, but if you use them as denominator for visualizing counts it can scale your values by several folds and the visualization will be very confusing. This is the reason why if you look at our code we never divide any count by bias, but always compare insertions to an expected distribution. Hence I think your second idea is better. One thing to note is that when we trained our bias model the bias is in fact relative bias within a local window (+/-50 bp) so you might want to match this when you calculate the expectation.
Yan
Thank you! I'll go with the second one.
Hi!
I would like to make a visualization of my insertions counts at a certain region like this:
However, I want the number of insertions to be corrected based on the Tn5 bias calculated by the CNN. What is the best way to do this using PRINT? My first thought was to divide the insertion count at each bp by the bias at that bp, but I wanted to make sure that makes sense. So for my first peak/region, with length 1000, it would look something like this:
EDIT: my second, probably better thought, is to calculate the "expected insertions" using the bias and subtract that from the actual insertions. Does this make sense? Not sure if I'm interpreting the bias correctly