VCCRI / Sierra

Discover differential transcript usage from polyA-captured single cell RNA-seq data
GNU General Public License v3.0
49 stars 17 forks source link

DUTest() with genes containg ':' or space #39

Closed GeertvanGeest closed 3 years ago

GeertvanGeest commented 3 years ago

Runnnig DUTest() on a dataset with gene names containing colons (:) or spaces results in:

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names':  

Because DEXSeq removes spaces and columns in the gene names:

Warning messages:
1: In DEXSeq::DEXSeqDataSet(peak.matrix, sampleData = sampleTable,  :
  empty spaces or ':' characters were found either in your groupIDs or in your featureIDs, these will be removed from the identifiers

Causing mismatches between gene names and DEXSeq output.

This can be solved with changing differential_usage.R line 671 to:

# removing colons and spaces to match output of DEXSeq                                     
pid_gene_names <- gsub('[: ]', '', dexseq.feature.table$Gene_name)
rownames(dexseq.feature.table) <- paste0(pid_gene_names, ":", dexseq.feature.table$Peak_number) 
rj-patrick commented 3 years ago

Hi @GeertvanGeest,

Thanks for identifying this issue, I've incorporated your suggestion. Let us know if you have any more issues running DUTest.

Cheers, Ralph