Issues post-processing out.h5ad file from custom genome

Hi,

I have run scTE on custom Zea mays genome (known as Corn), using a bam file from 10x genomic CellRanger, giving me about 5000 good cells. Everything went smoothly but:

1- I cannot see any TE in the out.h5ad file.

2- Only like 15% of the genes I detected using CellRanger are present in the out.h5ad, and only the genes with a gene name. Plant genome are badly annotated and only a few genes get a name, even sometimes you get the same name for two separate genes ^^. Only the gene ID should be used in plant genomes. Is there a way to ensure that only the gene ID is used and not the gene name ? examples of the last column of the Zea mays gtf file from Plant.ensemble: gene with name: gene_id "Zm00001d048603"; gene_name "GRAS-transcription factor 83"; gene_source "gramene"; gene_biotype "protein_coding"; ex gene without name: gene_id "Zm00001d027230"; gene_source "gramene"; gene_biotype "protein_coding"; Should I modify the gtf manually ?

3- Could you precise what are the 6 columns of the TE bed file that need to be included? In the UCSC website cited in the tutorial the definition of a bed file is "BED lines have three required fields and nine additional optional fields; 1- chrom - The name of the chromosome 2- chromStart - The starting position of the feature in the chromosome 3- chromEnd - The ending position of the feature in the chromosome"

What are the three other columns necessary for scTE ? Does one of the TE need a gene_name "TExxxx" ?

As an example xenopus in the tutorial (https://hgdownload.soe.ucsc.edu/goldenPath/xenTro9/database/rmsk.txt.gz) has like 17 columns.

Finally, I converted the out.h5ad format to an Seurat Object using SeuratDisk package

Convert("out.h5ad", dest = "h5seurat", overwrite = TRUE)
pbmc3k <- LoadH5Seurat("out.h5seurat")

Is that a good way to do so ? Is there an easier way to get the expression matrix from the h5ad?

Thank you very much in advance for your help,

Bruno

JiekaiLab / scTE

Issues post-processing out.h5ad file from custom genome #38