EMBL-Hentze-group / DEWSeq

R/Bioconductor package for e/iCLIP data analysis
5 stars 1 forks source link

resultRegions() and toBED() function #11

Closed fulaibaowang closed 6 months ago

fulaibaowang commented 6 months ago

Hi,

I am running the vignette, and I will really appreciate if you can explain a bit more about the output.

1 extractRegions As wrote there, extractRegions function to combine the overlapping significant windows. But in the result, you still see overlapping regionns, for example in the vignette:

##  4 chr1           28648620   28648730 +                      5           110
##  5 chr1           28648620   28648733 +                      5           113

Then the real number of signficant binding region shall be less than the total number of row of resultRegions table (218)?

  1. toBED if I do :
    resultRegions <- extractRegions(windowRes  = resultWindows,
                                padjCol    = "p_adj_IHW",
                                padjThresh = 0.01, 
                                log2FoldChangeThresh = 0.5) %>% as_tibble

    and

    toBED(windowRes = resultWindows,
      regionRes = resultRegions,
      fileName  = "enrichedWindowsRegions.bed",                               
       padjCol    = "p_adj_IHW",
       padjThresh = 0.01, 
       log2FoldChangeThresh = 0.5)

    the output file "enrichedWindowsRegions.bed" has much more rows than the table resultRegions, why?

Thank you!

sudeepsahadevan commented 6 months ago

Hi

  1. extractRegions extractRegions will only merge co-ordinates from the same gene, and if this particular region has multiple genes annotations, the regions will also occur multiple times

2 toBED will include both enriched regions and the windows corresponding to that region. If you open up a resulting bed file, you can see that there are regions (with tag @region in the name) and windows (without the tag) Hope this helps!

fulaibaowang commented 6 months ago

super helpful! thanks!

fulaibaowang commented 6 months ago

I want to ask another question here :)

so the family-wise corrected windows is corrected for multiple testing with Benjamini-Hochberg in resultsDEWSeq.

And in the vignette you show afterwards, IHW package can be used again for correcting for multiple hypothesis testing.

Can you talk a bit more about this two multiple testing correction and the difference? Is this a bit too stringent here? I am running some of my data and got very few significant hits.

Thank you!

sudeepsahadevan commented 6 months ago

Hi sorry if that was not clear, it is either BH correction or correction using ihw but not both. IHW is a data driven alternative to FWER correction using Benjamini Hochberg: https://bioconductor.org/packages/release/bioc/html/IHW.html

fulaibaowang commented 6 months ago

I got it now, thank you!