Open lianos opened 6 years ago
Try this to quickly see which gene-level annotations we can't match up:
library(archs4)
a4 <- Archs4Repository()
ys <- as.DGEList(a4, "GSE89189", feature_type = "gene", row_id = "symbol")
ye <- as.DGEList(a4, "GSE89189", feature_type = "gene", row_id = "ensembl")
b0rkd <- subset(ys$genes, !h5idx %in% ye$genes$h5idx)
There are 7979 features in b0rkd
! That is to say: there are ~8k meta/genes
("human gene symbols") that we can't find using the gene_name
column using the Homo_sapiens.GRCh38.90.gtf
annotations.
Although we are using the same version of the ensembl gtf files as are used within the ARCHS4 data processing pipeline, there are some genes and transcripts that are not successfully matched up in the
create_augmented_feature_info
function.These were the gtf files used to created to attempt to match gene symbols and transcript identifiers: