3DGenomes / binless

Resolution-independent normalization of Hi-C data
GNU Lesser General Public License v3.0
7 stars 2 forks source link

How to get significant interactions? #3

Closed pollicipes closed 6 years ago

pollicipes commented 6 years ago

After I have generated themat object and do the plot_binless_matrix() I set a threshold and then get the top 1% or so. But, this strategy leads me to calling always significant bins, even if in a certain region there is nothing significant, because I am splitting the dataset in 100 parts and simply taking the 1% higher values.

How can I select the statistically relevant interactions without using this thresholding system?

Thanks, Juan

yannickspill commented 6 years ago

Thanks Juan for this question! In Binless (with optimized lambda), interactions are significantly different from the background if the signal column in mat is nonzero. Since it might be sometimes too much, you can always narrow it down to a smaller subset, by selecting those regions with high fold change. For example, suppose you want regions with a log10 fold change larger than 1, then you will filter the signal matrix (and not the binless matrix!) at that threshold. In data.table syntax, this gives mat[abs(log10(signal))>1] If you then want to retrieve the binless matrix values only where significant (in 4 column format), you could for example do mat[abs(log10(signal))>1,.(name,bin1,bin2,binless)] hth Y

pollicipes commented 6 years ago

Thanks for the answer, Yannick! :)

However, just a tiny clarification: When running the code you wrote above I get this error.

Error in eval(.massagei(isub), x, parent.frame()): object 'signal' not found

I think it is because the matrix mat does not contain the object signal, but It already contains the log_signal column. mat comes from the data table named out. You can check how mat looks it with this command: head(out) Then proceed as the tutorial to create the object a where you already have the log_signal column. a=as.data.table(out$mat) head(a)

If I got it all correctly, now its just about filtering this data.frame a by the column log_signal and your log10 fold change set value.

Best, J

yannickspill commented 6 years ago

Yes, you are right to point that out. In version 0.11 and below, fast and slow binless do not report exactly the same columns, so if you have log_signal instead of signal, then the following command should do the filtering: mat[abs(log_signal/log(10))>1]