Open ahua217 opened 3 years ago
Hi,
Could you clarify what you mean by "signal from control peaks higher than the threshold"? If you mean that there are some experimental peaks that appear as visible peaks but are not called (presumably because there are control peaks overlapping that exceed the threshold), there is some discussion of remedies to that problem in Issue #36 that might be helpful for you. I've also made an amendment to the code as described in Issue #47 under mpmeers-patch-2012107 such that the code downloaded there will not throw out peaks if they overlap a control-enriched site, so by comparing the results from that code to the results from the master branch code you could infer which sites have control signal that exceeds the threshold. However, I'm not convinced I'm interpreting your question correctly, so please let me know if that's the case.
Mike
Thank you for your help mpmeers! I captured and adapted a figure from SEACR paper to clarify what I mentioned. Take the outstanding purple peak as an example, I want to know what those peaks are, especially when the IgG antibody works strangely at really bad cell condition like DNA unstability before CUT&RUN. How can I get such peak information after a run? I think these peaks are also important:1) They may contribute markedly to FDR, right? They may cover real peak in Target. 2)We may have more flexible options if SEACR can return such information. [A situation for flexible use: Sometimes I may use the gene-knockout sample as non-IgG control for SEACR analysis. This works much better than IgG since both Target and Control use the same antibody against that gene product. Whereas sometimes my antibody is for modified histone. I already knew from previous analysis that its binding sites largely (90%) decreased after that gene knockout ("Control"). Then I want to know what those few (10%) upregulated bind sites are (in "Control"). So I guess this purpose can be satisfied by one run of SEACR with better control setting, instead of the multiple steps with the setting of a different antibody and overlap analysis after SEACR.]
Hi,
Thanks for clarifying your question. As you mention, this isn't a feature of SEACR at the moment, but happy to consider adding something like this to the next update. Until such time, one thing you could do right now is actually run SEACR on the control sample alone as the first entry, then use a numeric threshold (e.g. 0.001) as the second entry instead of a normal control bed graph, and this will return the top 0.1% of control regions by signal (the actual threshold doesn't matter much). From there you can sort on the fourth column (total signal) and find the top control regions by total signal, and get a sense of how they're distributed.
I will say that when there is a large IgG peak, there is almost always a large peak in the experimental data as well, since in most cases these map to repeat regions that get overrepresented in background. However, for your example use case where the "control" is actually a different treatment, this functionality would be useful. I'll keep this open and update you if I'm able to add something like this in the near future. Hope this is helpful otherwise.
Mike
Thank you for your suggestion mpmeers. I did the second entry with only threshold. Most resulting bed files were indeed bigger than those from the runs with IgG controls. However, some were much smaller. I guess that is the reason you said the actual threshold doesn't matter much. So I could tune down or up the new threshold, and manually cut out what I want from the unfiltered list, right?
Hi,
That's what I had in mind--at least then you could get a sense of the places where the "control" file has the highest signal. An alternate, indirect approach would be what I suggested in my first comment, which is to use the code from mpmeers-patch-201207 to call peaks without those that overlap control peaks being removed, and comparing that result with the result from the master branch. Again, if I update to include the functionality you want I will let you know.
Mike
Hello! Even in the run above stringent threshold, there are still some signal from control higher than the thredshold.If I want to learn the peaks of control signal above threshold, how can I get them? Thank you very much!