davismcc / archive-scater

An archived version of the scater repository, see https://github.com/davismcc/scater for the active version.
64 stars 18 forks source link

using feature IDs with "." in getBMFeatureAnnos() #119

Closed LindaDansereau closed 7 years ago

LindaDansereau commented 7 years ago

Hello, I'm trying to get getBMFeatureAnnos() to run on a list of gene names which often contain a ".". However, it appears that anything after the "." gets stripped as the function runs, creating a number of duplicate row names in my case.

## Remove transcript ID artifacts from runKallisto (eg. ENSMUST00000201087.11 -> ENSMUST00000201087)
feature_ids <- gsub(pattern = "\\.[0-9]+", replacement = "", x = feature_ids)

Is there a way to make that an optional step? Is there another way around it?

Thank you for your help.

Linda

Example code copied from BioConductor support forum post (https://support.bioconductor.org/p/97849/)

#TestData loaded as a .csv file

TestData <- read.csv("testdata.csv", colClasses = c(list("character"), rep("numeric", 8)), row.names = 1)

TestData
#        X cell.1a cell.1b cell.1c cell.2a cell.2b cell.3a cell.3b cell.3c
#1 2RSSE.1     866    1404     898     129    1053     141      33      70
#2 2RSSE.2      58     171      65      17      70      36      11      17
#3 MTCE.23   14911   27132   10405   82033  117449   57775   11544   14426
#4 MTCE.25    1888    3615    1453    5891   40047    9144    2396    2947
#5 MTCE.31   20818   38746   12289  235235  211993  109575   19117   20580
#6   cct-6    1488    2236    1274     487    6430    1006    2311     381
#7   cct-8    1113    1679    1099     530    3727    1012    1135     130
#8   CD4.3      58      70      64      45     122      19      59      70
#9   CD4.7      34      37      27      56     400      11      53      88

sce <- newSCESet(countData = TestData)

sce <- getBMFeatureAnnos(sce, 
filters = "external_gene_name", 
attributes = c("wormbase_gene", "ensembl_gene_id","external_gene_name", "chromosome_name", "transcript_biotype", "go_id", "kegg_enzyme", "entrezgene"), 
feature_symbol = "external_gene_name", 
feature_id = "wormbase_gene", 
biomart = "ENSEMBL_MART_ENSEMBL", dataset = "celegans_gene_ensembl", host = "www.ensembl.org")

Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘2RSSE’, ‘CD4’, ‘MTCE’

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.30.0       scater_1.2.0         ggplot2_2.2.1        Biobase_2.34.0      
 [5] BiocGenerics_0.20.0  gplots_3.0.1         RColorBrewer_1.1-2   edgeR_3.16.5        
 [9] limma_3.30.13        openxlsx_4.0.17      BiocInstaller_1.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11         locfit_1.5-9.1       lattice_0.20-34      GO.db_3.4.0         
 [5] gtools_3.5.0         assertthat_0.2.0     digest_0.6.12        mime_0.5            
 [9] R6_2.2.2             plyr_1.8.4           stats4_3.3.2         RSQLite_2.0         
[13] zlibbioc_1.20.0      rlang_0.1.1          lazyeval_0.2.0       data.table_1.10.4   
[17] gdata_2.18.0         blob_1.1.0           S4Vectors_0.12.2     stringr_1.2.0       
[21] RCurl_1.95-4.8       bit_1.1-12           munsell_0.4.3        shiny_1.0.3         
[25] httpuv_1.3.5         vipor_0.4.5          pkgconfig_2.0.1      ggbeeswarm_0.5.3    
[29] htmltools_0.3.6      tximport_1.2.0       tibble_1.3.3         gridExtra_2.2.1     
[33] IRanges_2.8.2        matrixStats_0.52.2   XML_3.98-1.9         viridisLite_0.2.0   
[37] dplyr_0.7.1          bitops_1.0-6         grid_3.3.2           xtable_1.8-2        
[41] gtable_0.2.0         DBI_0.7              magrittr_1.5         scales_0.4.1        
[45] KernSmooth_2.23-15   stringi_1.1.5        reshape2_1.4.2       viridis_0.4.0       
[49] bindrcpp_0.2         org.Ce.eg.db_3.4.0   rjson_0.2.15         tools_3.3.2         
[53] bit64_0.9-7          glue_1.1.1           beeswarm_0.2.3       AnnotationDbi_1.36.2
[57] colorspace_1.3-2     rhdf5_2.18.0         caTools_1.17.1       shinydashboard_0.6.1
[61] memoise_1.1.0        bindr_0.1 
UrszulaCzerwinska commented 7 years ago

Hello, I encounter exactly the same problem, can you propose a fix?

davismcc commented 7 years ago

This issue was moved to davismcc/scater#12