broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data
MIT License
119 stars 34 forks source link

A Feature to use gene ID instead of gene names #439

Open WellSIM opened 3 months ago

WellSIM commented 3 months ago

My scope: I want to implement Dropseq for total RNA seq.

Function:

  1. TagReadWithGeneFunction Issue: the gene ID without gene names and transcript were removed. Mostly, those are from miRNA. Request: a feature that retain gene ID whose without gene name and/or transcript.

  2. Digital Gene Expression Issue: Only return gene names Request: a feature return gene names, gene ID, and gene biotype.

jamesnemesh commented 3 months ago

Without changing dropseq software, here are some potential workarounds:

  1. Modify your GTF so that gene names are replaced by gene IDs where appropriate. You might perform this for all records to be consistent.
  2. There's no place to store these metadata fields in the existing DGE format, but that doesn't prevent you from generating them from your GTF file directly. If you change your gene names in step 1, then the output files will reflect those IDs and you should be able to perform a lookup.