bvaldebenitom / SoloTE

GNU General Public License v3.0
27 stars 6 forks source link

Deletion mitochondrial gene #48

Closed liuweihahaha closed 3 months ago

liuweihahaha commented 3 months ago

Hi:When I quantified TE and genes at the same time, I found genes without mitochondria at the end,

1.I used the genome and annotation filtering process recommended by Cellranger. Select the ensemble : Homo_sapiens.GRCh38.dna.primary_assembly.fa genecode : gencode.v46.primary_assembly.annotation.gtf

2.The filter standard provided by Cellranger is then filtered, and the index is established through STAR: STAR --runThreadN 40 --runMode genomeGenerate --genomeDir index --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa.modified --sjdbGTFfile gencode.v46.primary_assembly.annotation.gtf.filtered

3.Comparison via STARsolo:Because I have 5' end sequencing data, some parameters have been modified: STAR --soloType CB_UMI_Simple\ --runRNGseed 777 \ --soloCBwhitelist 737K-august-2016.txt \ --soloCBstart 1 \ --soloCBlen 16 \ --soloUMIstart 17 \ --soloUMIlen 10 \ --genomeDir index \ --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \ --soloUMIdedup 1MM_CR \ --soloUMIfiltering MultiGeneUMI_CR \ --soloCellFilter EmptyDrops_CR \ --clipAdapterType CellRanger4 \ --soloFeatures Gene GeneFull Velocyto \ --readFilesIn R2 R1 \ --readFilesCommand zcat \ --runThreadN 40 \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix GEX_065_3-1 \ --winAnchorMultimapNmax 100 \ --outFilterMultimapNmax 100 \ --outMultimapperOrder Random \ --outSAMmultNmax 1 \ --outSAMattributes NH HI nM AS CR UR CY UY CB UB GX GN sS sQ sM \ --soloBarcodeReadLength 0 \ --soloStrand Reverse 4.Quantification of TE and gene was performed by SoloTE: SoloTE_pipeline.py --threads 40 --bam $BAM_FILE --teannotation $TEANNOTATION --outputprefix $PREFIX --outputdir $OUTPUT_DIR --dual 5.Use the seurat package in R for analysis: data1<-ReadMtx("matrix.mtx","barcodes.tsv","features.tsv",feature.column=1) solote_seuratobj <- CreateSeuratObject(count=data1,min.cells=3,project="SoloTE") solote_seuratobj$percent_mt <- PercentageFeatureSet(solote_seuratobj,pattern="^Mt-")

But no mitochondrial genes were found in the results.When I look at features.tsv in legacytes_MATRIX, I find that the name of the gene is not quite the same as the given example.Here are the first 100 lines of my feature.tsv.So what's the reason for this

A1BG A1BG A1BG-AS1 A1BG-AS1 A2M A2M A2M-AS1 A2M-AS1 A2MP1 A2MP1 A3GALT2 A3GALT2 A4GALT A4GALT AAAS AAAS AACS AACS AAGAB AAGAB AAK1 AAK1 AAMDC AAMDC AAMP AAMP AANAT AANAT AAR2 AAR2 AARD AARD AARS1 AARS1 AARS2 AARS2 AARSD1 AARSD1 AASDH AASDH AASDHPPT AASDHPPT AASS AASS AATBC AATBC AATF AATF AATK AATK ABALON ABALON ABAT ABAT ABCA1 ABCA1 ABCA10 ABCA10 ABCA11P ABCA11P ABCA17P ABCA17P ABCA2 ABCA2 ABCA3 ABCA3 ABCA5 ABCA5 ABCA6 ABCA6 ABCA7 ABCA7 ABCA9 ABCA9 ABCA9-AS1 ABCA9-AS1 ABCB1 ABCB1 ABCB10 ABCB10 ABCB4 ABCB4 ABCB6 ABCB6 ABCB7 ABCB7 ABCB8 ABCB8 ABCB9 ABCB9 ABCC1 ABCC1 ABCC10 ABCC10 ABCC11 ABCC11 ABCC12 ABCC12 ABCC13 ABCC13 ABCC2 ABCC2 ABCC3 ABCC3 ABCC4 ABCC4 ABCC5 ABCC5 ABCC6 ABCC6 ABCC6P1 ABCC6P1 ABCD1 ABCD1 ABCD2 ABCD2 ABCD3 ABCD3 ABCD4 ABCD4 ABCE1 ABCE1 ABCF1 ABCF1 ABCF2 ABCF2 ABCF3 ABCF3 ABCG1 ABCG1 ABCG2 ABCG2 ABCG8 ABCG8 ABHD1 ABHD1 ABHD10 ABHD10 ABHD11 ABHD11 ABHD12 ABHD12 ABHD12B ABHD12B ABHD13 ABHD13 ABHD14A ABHD14A ABHD14B ABHD14B ABHD15 ABHD15 ABHD16A ABHD16A ABHD16B ABHD16B ABHD17A ABHD17A ABHD17B ABHD17B ABHD17C ABHD17C ABHD18 ABHD18 ABHD2 ABHD2 ABHD3 ABHD3 ABHD4 ABHD4

bvaldebenitom commented 3 months ago

Hi @liuweihahaha

I downloaded the GENCODE file from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_46/gencode.v46.annotation.gtf.gz , and the mitochondrial files appear as "MT-" rather thant "Mt-". Thus, you should change: solote_seuratobj$percent_mt <- PercentageFeatureSet(solote_seuratobj,pattern="^Mt-") to solote_seuratobj$percent_mt <- PercentageFeatureSet(solote_seuratobj,pattern="^MT-")

liuweihahaha commented 3 months ago

Thank you!!!