ZarnackGroup / BindingSiteFinder

Package for the definition of biniding sites for iCLIP data
https://www.bioconductor.org/packages/release/bioc/html/BindingSiteFinder.html
6 stars 1 forks source link

Problems if not all samples have the same chromosomes #4

Closed MelinaKlostermann closed 1 year ago

MelinaKlostermann commented 1 year ago

Hi, when I use makeBindingSites, I get the following error Error in .subset_by_GenomicRanges(x, i) : ‘x’ must have unique names when subsetting by a GenomicRanges subscript.

I think the error comes from the .collapseSamples function, that uses for (i in seq_along(p)) { pSum = pSum + p[[i]] } This will not add up the chromosomes right if there are different chromosomes or the chromosomes do not have the same order in to samples.

when I merge these two samples

names(signal$signalPlus$1_oe_FLAG) [1] “KI270802.1” “chr1" “chr10” [4] “chr11" “chr12” “chr13" [7] “chr14” “chr15" “chr16” [10] “chr17" “chr18” “chr19" [13] “chr2” “chr20" “chr21” [16] “chr22" “chr3” “chr4" [19] “chr5” “chr6" “chr7” [22] “chr8" “chr9” “chrM” [25] “chrX” “chrY” names(signal$signalPlus$2_oe_FLAG) [1] “GL000225.1” “chr1" “chr10” [4] “chr11" “chr12” “chr13" [7] “chr14” “chr15" “chr16” [10] “chr17" “chr18” “chr19" [13] “chr2” “chr20" “chr21” [16] “chr22" “chr3” “chr4" [19] “chr5” “chr6" “chr7” [22] “chr8" “chr9” “chrM” [25] “chrX” “chrY” the merge does not contain “GL000225.1” names(p) [1] “KI270802.1" “chr1” “chr10" “chr11” “chr12" [6] “chr13” “chr14" “chr15” “chr16" “chr17” [11] “chr18" “chr19” “chr2" “chr20” “chr21" [16] “chr22” “chr3" “chr4” “chr5" “chr6” [21] “chr7" “chr8” “chr9" “chrM” “chrX” [26] “chrY”

instead “KI270802.1” and “GL000225.1" are added and called “KI270802.1”

If I then merge all 4 samples the merge looks like this: names(sgnMergePlus) [1] “KI270802.1” “chr1" “chr10” [4] “chr11" “chr12” “chr13" [7] “chr14” “chr15" “chr16” [10] “chr17" “chr18” “chr19" [13] “chr2” “chr20" “chr21” [16] “chr22" “chr3” “chr4" [19] “chr5” “chr6" “chr7” [22] “chr8" “chr9” “chrM” [25] “chrX” “chrY” NA [28] NA

and it will through an error because two names are NA. However, it would probably not cause an error if just one name is NA, which is kind of dangerous. Because it might add up the wrong stuff without causing an error.

MelinaKlostermann commented 1 year ago

I think a fix for the .collapsSamples function would probably be pSum = p[[1]] names = unique(unlist(sapply(p, names))) for (i in 2:length(p)) { pSum = c(pSum, p[[i]]) pSum = split(pSum, names(pSum)) pSum = lapply(pSum, function(x){ if (length(x)==2){ x = x[[1]] + x[[2]] } }) }

MelinaKlostermann commented 1 year ago

Hi, my fault. I updated to BindingSiteFinder Version 1.4.0 and this seems to solve the problem.

bds = BSFDataSetFromBigWig(ranges = pureclip_sites$PURB_oe_FLAG_merged_pureclip_sites_mtp001.bed, meta = meta) Input ranges are not sorted, sorting for you. Fixed ranges input, removing chr: GL000009.2 GL000195.1 GL000205.2 GL000213.1 GL000214.1 GL000219.1 GL000220.1 GL000224.1 GL000251.2 GL000252.2 GL000253.2 GL000254.2 GL000255.2 GL000256.2 ...