ZarnackGroup / BindingSiteFinder

Package for the definition of biniding sites for iCLIP data
https://www.bioconductor.org/packages/release/bioc/html/BindingSiteFinder.html
6 stars 1 forks source link

3.3 Transcript region assignment - typo? #7

Closed MelinaKlostermann closed 1 year ago

MelinaKlostermann commented 1 year ago

Hi, maybe I am missing something, but I was wondering about the third code chunk of the 3.3 Transcript region assignment chapter in the vignette. The code is:

# Count the overlaps of each binding site fore each region of the transcript. 
cdseq = regions$CDS %>% countOverlaps(bindingSites,.)
intrns = regions$Intron %>% countOverlaps(bindingSites,.)
utrs3 = regions$UTR3 %>% countOverlaps(bindingSites,.)
utrs5 = regions$UTR5 %>% countOverlaps(bindingSites,.)
countDf = data.frame(CDS = cdseq, Intron = intrns, UTR3 = utrs3, UTR5 = utrs5)
# Count how many times an annotation is not present.
df = data.frame(olClass = apply(countDf,1,function(x) length(x[x == 0]))) 

So you want to know how many times a binding site overlaps with more than one region. You then say you "Count how many times an annotation is not present." But shouldn't you count how many times an annotation is present? So length(x[x != 0]) in the last row?

Because the plot afterward says "Bar plot shows how many times a binding site overlaps with an annotation of a different transcript region."

KathiZarnack commented 1 year ago

Hi, in principle the resulting plot is still correct, because the labels of df are inverted in the next line of the code: df = data.frame(type = rev(names(table(df))), count = as.vector(table(df))) Nonetheless, this could certainly be solved in a more intuitive way...

One solution (thanks, Sri!) is to use a GRangesList for the regions: regionsList = as(regions, "GRangesList") df= findOverlaps(x, regionsList) %>% as.data.frame() %>% table(.) This df can then either be used for an UpSet plot or further processed with rowSums() %>% table() into a similar structure as used in the code above: df = rowSums() %>% table() %>% as.data.frame(.) colnames(df) = c("type", "count")