I have run scTE on custom Zea mays genome (known as Corn), using a bam file from 10x genomic CellRanger, giving me about 5000 good cells.
Everything went smoothly but:
1- I cannot see any TE in the out.h5ad file.
2- Only like 15% of the genes I detected using CellRanger are present in the out.h5ad, and only the genes with a gene name. Plant genome are badly annotated and only a few genes get a name, even sometimes you get the same name for two separate genes ^^. Only the gene ID should be used in plant genomes.
Is there a way to ensure that only the gene ID is used and not the gene name ?
examples of the last column of the Zea mays gtf file from Plant.ensemble:
gene with name:
gene_id "Zm00001d048603"; gene_name "GRAS-transcription factor 83"; gene_source "gramene"; gene_biotype "protein_coding";
ex gene without name:
gene_id "Zm00001d027230"; gene_source "gramene"; gene_biotype "protein_coding";
Should I modify the gtf manually ?
3- Could you precise what are the 6 columns of the TE bed file that need to be included?
In the UCSC website cited in the tutorial the definition of a bed file is "BED lines have three required fields and nine additional optional fields;
1- chrom - The name of the chromosome
2- chromStart - The starting position of the feature in the chromosome
3- chromEnd - The ending position of the feature in the chromosome"
What are the three other columns necessary for scTE ?
Does one of the TE need a gene_name "TExxxx" ?
Hi,
I have run scTE on custom Zea mays genome (known as Corn), using a bam file from 10x genomic CellRanger, giving me about 5000 good cells. Everything went smoothly but:
1- I cannot see any TE in the out.h5ad file.
2- Only like 15% of the genes I detected using CellRanger are present in the out.h5ad, and only the genes with a gene name. Plant genome are badly annotated and only a few genes get a name, even sometimes you get the same name for two separate genes ^^. Only the gene ID should be used in plant genomes. Is there a way to ensure that only the gene ID is used and not the gene name ? examples of the last column of the Zea mays gtf file from Plant.ensemble: gene with name: gene_id "Zm00001d048603"; gene_name "GRAS-transcription factor 83"; gene_source "gramene"; gene_biotype "protein_coding"; ex gene without name: gene_id "Zm00001d027230"; gene_source "gramene"; gene_biotype "protein_coding"; Should I modify the gtf manually ?
3- Could you precise what are the 6 columns of the TE bed file that need to be included? In the UCSC website cited in the tutorial the definition of a bed file is "BED lines have three required fields and nine additional optional fields; 1- chrom - The name of the chromosome 2- chromStart - The starting position of the feature in the chromosome 3- chromEnd - The ending position of the feature in the chromosome"
What are the three other columns necessary for scTE ? Does one of the TE need a gene_name "TExxxx" ?
As an example xenopus in the tutorial (https://hgdownload.soe.ucsc.edu/goldenPath/xenTro9/database/rmsk.txt.gz) has like 17 columns.
Finally, I converted the out.h5ad format to an Seurat Object using SeuratDisk package
Is that a good way to do so ? Is there an easier way to get the expression matrix from the h5ad?
Thank you very much in advance for your help,
Bruno