Closed mckinsel closed 5 years ago
@kishorikonwar - calling your attention to this issue that's blocking a Q2 epic.
@mckinsel Thanks for bringing this to our attention! I understand that we also need to make sure that the ensembl ids we output are the versioned type (i.e. end in .1, .2, ...) @kbergin
@barkasn yeah it's probably best if you have the versions in there, though tbh the matrix service currently ignores them based on the assumption that the gencode versions are kept in sync.
Thanks @mckinsel! Will see this gets prioritized. Does this block the matrix service being able to handle Optimus outputs?
Yes it's currently blocking our loading of optimus bundles.
Update on progress: @kishorikonwar is working on a PR to update Optimus to output gene ids in addition to gene names. Infrastructure is aware of the incoming update and will be prepared to create a new integration test bundle for both human and mouse when the update is released. Kishori will put the PR here soon and we will get it reviewed on Monday.
cc @jkaneria @brianraymor
In the optimus zarr output, the
gene_id
array contains gene names. The other pipelines use gene ids, which is what the matrix service is expecting. Also, gene ids are unique whereas i think the gene names may not be.The name vs id seems to come from the
GE
tag created with theTagReadWithGeneExon
tool from dropseqtools.