Closed h4rvey-g closed 1 day ago
Also I'm getting a low spliced/unspliced ratio for every sample
import pandas as pd
import scanpy as sc
adata = sc.read_mtx("data/103.self_workflow/T1/output/filter_matrix/matrix.mtx.gz")
adata_bc = pd.read_csv(
"data/103.self_workflow/T1/output/attachment/RNAvelocity_matrix/barcodes.tsv.gz",
header=None,
)
adata_features = pd.read_csv(
"data/103.self_workflow/T1/output/attachment/RNAvelocity_matrix/features.tsv.gz",
header=None,
)
adata = adata.T
adata.obs["cell_id"] = adata_bc
adata.var["gene_name"] = adata_features[0].tolist()
adata.var.index = adata.var["gene_name"]
adata_spliced = sc.read_mtx(
"data/103.self_workflow/T1/output/attachment/RNAvelocity_matrix/spliced.mtx.gz"
)
adata_spliced = adata_spliced.T
adata_unspliced = sc.read_mtx(
"data/103.self_workflow/T1/output/attachment/RNAvelocity_matrix/unspliced.mtx.gz"
)
adata_unspliced = adata_unspliced.T
# combine the spliced and unsplieced data
adata.layers["spliced"] = adata_spliced.X
adata.layers["unspliced"] = adata_unspliced.X
scv.pl.proportions(adata)
Here's the workflow stats Any insights on this? Thank you.
"Spanning" refers to spanning intron-exon junctions. Due to the current annotation logic in the software, if 50% of the read is mapped to an exon, it is considered exon, and the rest is considered intron if mapped to the gene. Therefore, this region currently has no data in the software version. Reads mapped to exonic regions are considered spliced, while those mapped to intronic or spanning regions are considered unspliced. Since your sample is nuclear data, a higher proportion of unspliced reads is quite normal. I do not have experience with using anndata to analyze velocyto, but I will look into this issue .
Got it. Thank you.
Hi! I read the function in https://github.com/MGI-tech-bioinformatics/DNBelab_C_Series_HT_scRNA-analysis-software/issues/67#issuecomment-2060288401 but didn't see that you are using
spanning.mtx.gz
. What is this file? Additionally, if I want to load the files generated by the dnbc workflow into Python and use scVelo for downstream analysis, which files should I choose to load to construct the anndata object? Thanks for your help.