ZarnackGroup / BindingSiteFinder

Package for the definition of biniding sites for iCLIP data
https://www.bioconductor.org/packages/release/bioc/html/BindingSiteFinder.html
6 stars 1 forks source link

Error in .subset_by_GenomicRanges(x, i): 'x' must have unique names when subsetting by a GenomicRanges subscript #3

Closed muzyuri closed 1 year ago

muzyuri commented 2 years ago

Hi, please help.

I'm having some issues running BindingSiteFinder. After following the complete iCLIP analysis pipeline on the instructions, I check my data and get these:

### checking the bed file
read.table(BED.file1)

           V1       V2       V3 V4          V5 V6
1   Bomo_Chr1   366460   366461  3   3.7454300  +
2   Bomo_Chr1   366461   366462  3  17.8156000  +
3   Bomo_Chr1   553887   553888  3   6.1753600  +
...
164 Bomo_Chr1 13135067 13135068  3   3.8580500  +
165 Bomo_Chr1 13135205 13135206  3   6.2024100  +
166 Bomo_Chr1 13135206 13135207  3   1.8451600  +
 [ reached 'max' / getOption("max.print") -- omitted 18706 rows ]
### importing the GRanges object
cs = rtracklayer::import(con = "BED.file1", format = "BED")
cs

GRanges object with 18872 ranges and 2 metadata columns:
              seqnames    ranges strand |        name     score
                 <Rle> <IRanges>  <Rle> | <character> <numeric>
      [1]    Bomo_Chr1    366461      + |           3   3.74543
      [2]    Bomo_Chr1    366462      + |           3  17.81560
      [3]    Bomo_Chr1    553888      + |           3   6.17536
      [4]    Bomo_Chr1    553935      + |           3   0.46261
      [5]    Bomo_Chr1    629420      + |           3   3.59940
      ...          ...       ...    ... .         ...       ...
  [18868] Bomo_Scaf657      7769      - |           3   4.72738
  [18869] Bomo_Scaf657      7768      - |           3   4.76283
  [18870] Bomo_Scaf657      7767      - |           3   2.66149
  [18871] Bomo_Scaf657      7766      - |           3   4.81242
  [18872] Bomo_Scaf659      7245      - |           3  20.80880
  -------
  seqinfo: 59 sequences from an unspecified genome; no seqlengths
### Importing the bigwig files
files <- "BIGWIG.file1"
clipFilesP <- list.files(files, pattern = "_s.bw$", full.names = TRUE)
clipFilesM <- list.files(files, pattern = "_as.bw$", full.names = TRUE)
### checking the meta data
meta = data.frame(
  id = c(1,2,3,4),
  condition = factor(c("WT","WT","KD","KD"), 
  levels = c("KD","WT")), 
  clPlus = clipFilesP, clMinus = clipFilesM)
meta

  id condition clPlus
1  1       WT  BIGWIG.file1/WT-1_s.bw
2  2      WT  BIGWIG.file1/WT-2_s.bw
3  3      KD  BIGWIG.file1/KD-1_s.bw
4  4      KD  BIGWIG.file1/KD-2_s.bw
    clMinus
1  BIGWIG.file1/WT-1_as.bw
2  BIGWIG.file1/WT-2_as.bw
3  BIGWIG.file1/KD-1_as.bw
4  BIGWIG.file1/KD-2_as.bw
### Construction of the the BindingSiteFinder dataset
bds = BSFDataSetFromBigWig(ranges = cs, meta = meta, silent = TRUE)
bds

Object of class BSFDataSet 
Contained ranges:  17.918 
----> Number of chromosomes:  59 
----> Ranges width:  1 
Contained conditions:  KD WT

But when I try to proceed with the workflow like this:

supportRatioPlot(bds, bsWidths = seq(from = 3, to = 19, by = 2))

I get the following error:

Error in .subset_by_GenomicRanges(x, i) : 
  'x' must have unique names when subsetting by a GenomicRanges subscript
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Do you think you could help me figure out what's causing this?

Thank you!

MirkoBr commented 2 years ago

Hi @muzyuri ,

sorry for the late reply. Did you fixed it in the meantime?

If not what does the output of the makeBindingSites() function looks like? Does it produce the same error?

It seems like there is an issue with the names in of your GRanges object and/ or the seqnames (names for the chromosomes). Can you once make sure that the seqnames in the GRanges object match with the names of the signal object. You can do this by doing sth by looking into head(names(getSignal(bds)[[1]][[1]])) and head(seqnames(getRanges(bds))).

Besides that, maybe the '_' in the seqnames is causing the trouble.

I hope this is still helpful for you, even tough the reply was so late.

muzyuri commented 2 years ago

Hi @MirkoBr, thank you for your kindly support!

I am still stacking here. After your reply, I checked my GRanges object and the seqnames and made sure that the seqnames in the GRanges object matched with the names of the signal object. Besides, I removed all '_' and '-' in the seqnames and retried to proceed with the workflow like this:

supportRatioPlot(bds, bsWidths = seq(from = 3, to = 19, by = 2))

but I got the following another error:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'flesh' in selecting a method for function 'relist': subscript contains out-of-bounds ranges

I also got this same error in the makeBindingSites() function.

Do you have any idea what is wrong? Thanks a lot in advance.

MirkoBr commented 2 years ago

Hi @muzyuri,

the error message that you get is a very general one. So without seeing some of your objects it's hard to tell where it goes wrong.

Could you provide a small subset of your data. Just like the first 10-20 entries in the GRanges plus the iCLIP signal for these ranges? To get both sets, see the code snippet below.

library(BindingSiteFinder)

# subset the GRanges object
rangesSubset = head(getRanges(bds), 20)

# subset the RLE signal list
totalSignal = signal$signalPlus$`1_WT`$chr22
startAt = max(as.data.frame(ranges(totalSignal))$start)
endAt = min(as.data.frame(ranges(totalSignal))$start)
signalSubset = totalSignal[startAt:endAt]

# save both objects
saveRDS(rangesSubset, file = "./rangesSubset.rds")
saveRDS(signalSubset, file = "./signalSubset.rds")
MirkoBr commented 1 year ago

fixed in 10ce962