calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
409 stars 126 forks source link

question about clip threshold in basenji/enformer. #156

Closed r1cheu closed 1 year ago

r1cheu commented 1 year ago

Hi, @davek44. I've noticed the clip_soft and clip_extreme options in the basenji_data_read.py. I'm wondering is there any standrad ways to define the threshold(for I'm using the chip-seq bigwig from plant). And whether the clip matters a lot?

davek44 commented 1 year ago

clip_extreme is meant to protect the training process from truly weird genomic regions with super high coverage. I don't touch it much. clip_soft probably doesn't matter a ton for ChIP-seq where the dynamic range isn't as large as RNA abundance statistics. I generally set it somewhere from 32-128, but it depends on how deeply sequenced your samples are. You could try a couple of values and make sure the Spearman correlations are robust.

r1cheu commented 1 year ago

okay,got it. thanks for your reply.