Open hanhyebin opened 3 years ago
@hanhyebin the reference genome is available at s3://czb-tabula-muris-senis/reference-genome/
Thanks but I wanted to know more so how if differs from mm10.
The reason I ask is that I am trying to integrate this dataset with other datasets and if mm10plus is much different than mm10, I will need to realign it to mm10 (which I can do) but if there is not much difference between the two, I can continue to use it as is.
Thank you in advance.
I am also having issues with this.
I have downloaded the .h5ad files for all datasets, and I find in the matrices genes that are not present in this release.
For example, the dataset droplet-Liver contains the gene "Fam150a". I have just downloaded the reference .tgz from aws, and the gene "Fam150a" (nor "Fam150", nor "am150") do not exist in gencode.vM19/genes/genes.gtf
Out of 20138 genes in the object matrix, there are 2081 genes that do not exist in the gencodeM19 gtf file.
> length(rownames(seu)[!rownames(seu) %in% gencodeM19_genes$gene_name])
[1] 2081
> head(rownames(seu)[!rownames(seu) %in% gencodeM19_genes$gene_name], 20)
[1] "Fam150a" "3110035E14Rik" "6030422M02Rik" "4932411L15" "Gm106" "Tceb1"
[7] "1110058L19Rik" "Bai3" "Fam123c" "4632411B12Rik" "6330578E17Rik" "D1Bwg0212e"
[13] "2610017I09Rik" "2900092D14Rik" "A530098C11Rik" "1700029F09Rik" "4832428D23Rik" "Dnahc7b"
[19] "Sdpr" "Obfc2a"
I've found some random gtf file in the internet when googling for mm10plus ( http://waxmanlabvm.bu.edu/kkarri/G171/ref/updated-usethis-mm10plus-pcg-ercc-lnc-nodups-mcherry/genes/ ). This file does indeed include the genes "Fam150a" and "Tceb1". It doesn't match 100% of the genes present in the object matrix. However, this file contains 426 genes that were not present in the gencodeM19 file.
> sum(rownames(seu)[!rownames(seu) %in% gencodeM19_genes$gene_name] %in% mm10plus_genes$gene_name)
[1] 426
Which annotation was used to create the matrices for these datasets? Am I messing this up big time, or is there a serious mismatch between the data matrices and the gtf files provided?
Thanks in advance
Hi,
I see that tabula muris senis used "mm10plus" as genome reference. I am assuming it is a modified version of mm10. If so, may I know what modifications/adjustments were made?
Thanks!