LHentges / LanceOtron

https://LanceOtron.molbiol.ox.ac.uk/
GNU General Public License v3.0
22 stars 5 forks source link

multiple replicates #10

Closed yudizhangzyd closed 1 year ago

yudizhangzyd commented 1 year ago

Hi,

If I have multiple replicates, do I merge them into one file and then run it? Thanks.

LHentges commented 1 year ago

Hello!

There are a few different ways to address multiple replicates.

Firstly I should say that LanceOtron will handle a merged track just fine. A peak's shape, an attribute our model assesses, appears to be well preserved through merging replicates. In fact we've even seen this in scATAC where pseudo-bulk ATAC tracks are made from compilations of many many cells, and the neural network handles this perfectly, without modification.

Another method involves running LanceOtron on each replicate, and finding consensus peaks across multiple replicates. The labs I work with tend to favour this method, though it does take more time. For this I'd filter to only regions with a Peak Score of >= 0.5 and use BedTools intersect, then merge the coordinates of overlapping peaks.

Often getting a list of quality regions and their coordinates is the goal of the peak calling process. If for some reason you need the combined score of a region, for instance the overall score if its Peak Score is 0.7 in one track and 0.9 in another, you could average these. Because this score actually represents a probability, I have also used these as input to a Bayesian updating model (I did that analysis for this paper: Nature).

I don't prefer using p-values (or q-values) for these analyses in general, but using them across multiple tracks reduces the false positives they tend to be associated with. LanceOtron does return p-values across a range of different backgrounds (in my PhD thesis I compared these, and found the 100 kb background p-values to be marginally better) - you could combine these in a straightforward manner using Fisher's method. Alternatively you could use these values in conjunction with the tool used by ENCODE called IDR.

Lots of good options! Please feel free to reach out again if I was unclear or could help further!!

Best,

Lance

yudizhangzyd commented 1 year ago

Thanks!