Bioconductor / Organism.dplyr

https://bioconductor.org/packages/Organism.dplyr
3 stars 3 forks source link

`fiveUTRsBYTranscript(, filter=)` does not return values for all transcripts #11

Closed mtmorgan closed 7 years ago

mtmorgan commented 7 years ago
> transcripts_tbl(src, filter=list(SymbolFilter("ADA")))
Joining, by = "entrez"
Source:   query [?? x 7]
Database: sqlite 3.11.1 [/home/mtmorgan/a/Organism.dplyr/inst/extdata/light.hg38.knownGene.sqlite]

  tx_chrom tx_start   tx_end tx_strand  tx_id    tx_name symbol
     <chr>    <int>    <int>     <chr>  <int>      <chr>  <chr>
1    chr20 44619522 44626491         - 169786 uc061xfj.1    ADA
2    chr20 44619522 44651742         - 169787 uc002xmj.4    ADA
3    chr20 44619810 44651691         - 169789 uc061xfl.1    ADA
> fiveUTRsByTranscript(src, filter = list(SymbolFilter("ADA")))
Joining, by = "entrez"
Joining, by = "entrez"
GRangesList object of length 1:
$169787 
GRanges object with 1 range and 5 metadata columns:
      seqnames               ranges strand |     tx_id   exon_id   exon_name
         <Rle>            <IRanges>  <Rle> | <integer> <integer> <character>
  [1]    chr20 [44651608, 44651742]      - |    169787    501401        <NA>
      exon_rank      symbol
      <integer> <character>
  [1]         1         ADA

-------
yubocheng commented 7 years ago

Line 542 of extractors.R function .getSplicings, cds <- .cds(x, filter=filter), when filter is applied to cds, only tx_id 169787 meets the condition. The logic of getting splicings data might need to revisit.

mtmorgan commented 7 years ago

I think my mistake

> five <- fiveUTRsByTranscript(TxDb.Hsapiens.UCSC.hg38.knownGene)
> c("169786", "169787", "169789") %in% names(five)
[1] FALSE  TRUE FALSE

So consistent with GRanges infrastructure.