Closed RandyHsj closed 2 months ago
Oh, I guess I probably find your code for this process using RDS format files but my file is in h5 for gene expression and bed and tsv for atacseq so I am wondering do you have some suggestions in dealing these two formats of files in your system?
Hi @RandyHsj , sorry that we are very amateur level users of python. For the velocity things we have better experience using loompy (based on kallisto) to produce spliced/unspliced data for 10x scRNA/snRNA datasets. For SHAREseq scRNA side, we used STARsolo (see below):
/gpfs/bin/STAR/STAR-2.7.10a_alpha_220818/STAR \
--runThreadN 12 \
--runMode alignReads \
--sysShell /bin/bash \
--genomeDir /gpfs/genomedb/STAR/2.7.10a/mm10/mm10_sjdbOverHang50bp/ \
--readFilesIn F350010028_L02_read_1.fq.gz F350010028_L02_read_2.fq.gz \
--readFilesCommand zcat \
--soloCBwhitelist bc1.tsv bc2.tsv bc3.tsv \
--soloType CB_UMI_Complex \
--soloCBposition 0_0_0_7 0_8_0_15 0_115_0_122 \
--soloUMIposition 0_18_0_25 \
--sjdbGTFfile /gpfs/genomedb/STAR/2.7.10a/mm10/genes.gtf \
--soloCellFilter CellRanger2.2 50000 0.99 10 \
--soloCBmatchWLtype 1MM \
--outFilterMultimapNmax 1 \
--outFileNamePrefix MEF_RNA_p6p7_8ntUMI \
--outReadsUnmapped Fastx \
--quantMode GeneCounts \
--bamRemoveDuplicatesType UniqueIdentical \
--soloFeatures Gene GeneFull Velocyto \
--outFilterScoreMinOverLread 0.1 \
--outFilterMatchNminOverLread 0.1 \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes NH HI AS nM CB UB CR CY UR UY
Thanks a lot for the kind information and suggestions. I'll try this out.
Best regards, Shijie
获取Outlook for Androidhttps://aka.ms/AAb9ysg
From: Yi Zhang @.> Sent: Friday, August 23, 2024 3:04:39 AM To: MagpiePKU/EpiTrace @.> Cc: Shijie HE 何世杰 @.>; Mention @.> Subject: Re: [MagpiePKU/EpiTrace] snRNAseq+scATACseq Processing rather than SHAREseq (Issue #8)
你通常不会收到来自 @.*** 的电子邮件。了解这一点为什么很重要https://aka.ms/LearnAboutSenderIdentification
Hi @RandyHsjhttps://github.com/RandyHsj , sorry that we are very amateur level users of python. For the velocity things we have better experience using loompy (based on kallisto) to produce spliced/unspliced data for 10x scRNA/snRNA datasets. For SHAREseq scRNA side, we used STARsolo (see below):
/gpfs/bin/STAR/STAR-2.7.10a_alpha_220818/STAR --runThreadN 12 --runMode alignReads --sysShell /bin/bash --genomeDir /gpfs/genomedb/STAR/2.7.10a/mm10/mm10_sjdbOverHang50bp/ --readFilesIn /gpfs/output/ECS_Research/data/ShareSeq/20230519_ShareSeq_MEF_CellCycle/fastq/F350010028_L02_read_1.fq.gz /gpfs/output/ECS_Research/data/ShareSeq/20230519_ShareSeq_MEF_CellCycle/fastq/F350010028_L02_read_2.fq.gz --readFilesCommand zcat --soloCBwhitelist /gpfs/output/ECS_Research/data/ShareSeq/20230519_ShareSeq_MEF_CellCycle/bin/bc1.tsv /gpfs/output/ECS_Research/data/ShareSeq/20230519_ShareSeq_MEF_CellCycle/bin/bc2.tsv /gpfs/output/ECS_Research/data/ShareSeq/20230519_ShareSeq_MEF_CellCycle/bin/bc3.tsv --soloType CB_UMI_Complex --soloCBposition 0_0_0_7 0_8_0_15 0_115_0_122 --soloUMIposition 0_18_0_25 --sjdbGTFfile /gpfs/genomedb/STAR/2.7.10a/mm10/genes.gtf --soloCellFilter CellRanger2.2 50000 0.99 10 --soloCBmatchWLtype 1MM --outFilterMultimapNmax 1 --outFileNamePrefix /gpfs/output/ECS_Research/data/ShareSeq/20230519_ShareSeq_MEF_CellCycle/RNA/MEF_RNA_p6p7_8ntUMI/ --outReadsUnmapped Fastx --quantMode GeneCounts --bamRemoveDuplicatesType UniqueIdentical --soloFeatures Gene GeneFull Velocyto --outFilterScoreMinOverLread 0.1 --outFilterMatchNminOverLread 0.1 --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI AS nM CB UB CR CY UR UY
― Reply to this email directly, view it on GitHubhttps://github.com/MagpiePKU/EpiTrace/issues/8#issuecomment-2305444068, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFVKJ2XKE4TCCEAAWFFIOA3ZSYY4PAVCNFSM6AAAAABL63VVRGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGQ2DIMBWHA. You are receiving this because you were mentioned.Message ID: @.***>
Hello, I am now using snRNAseq and scATACseq to reproduce the development trajactories reported in your great paper. But I encountered a problem that while I utilized the h5 formatted data output by cellranger to carry out downstream analysis. After transforming the format, I cannot calculate the RNA velocity since it said unspliced data is missing. I guess it may due to my processing problem. Do you have any suggestions for processing piplines or tools? Thanks a lot!!! Log: adata = sc.read_10x_h5('~/filtered_feature_bc_matrix.h5', genome=None, gex_only=True, backup_url=None) output_file = '~adata.h5ad' adata.write(output_file) adata.var_names_make_unique() sc.pp.neighbors(adata) sc.tl.louvain(adata) adata.layers['spliced'] = adata.X seta = [0,1,2,3,4,5,6,7,8,9,10,11,12] sub_adata = adata[ adata.obs['Cluster.Name']!=13 ].copy()
WARNING: Could not find spliced / unspliced counts.
print(list(f['matrix'].keys()))
['barcodes', 'data', 'features', 'indices', 'indptr', 'shape']