Closed GGboy-Zzz closed 1 month ago
Is your TSS BED file for the correct species and the correct assembly version?
What is the "normal" TSS enrichment score you get in ArchR?
I used code as below to generate TSS annotaion,
‘dataset = pbm.Dataset(name='mmusculus_gene_ensembl', host='http://www.ensembl.org')
annot = dataset.query(attributes=['chromosome_name', 'transcription_start_site', 'strand', 'external_gene_name', 'transcript_biotype'])
annot['Chromosome/scaffold name'] = annot['Chromosome/scaffold name'].to_numpy(dtype = str)
filter = annot['Chromosome/scaffold name'].str.contains('CHR|GL|JH|MT')
annot = annot[~filter]
annot['Chromosome/scaffold name'] = annot['Chromosome/scaffold name'].str.replace(r'(\b\S)', r'chr\1')
annot.columns=['Chromosome', 'Start', 'Strand', 'Gene', 'Transcript_type']
annot = annot[annot.Transcript_type == 'protein_coding']
annot.to_csv('./scenicplus/mouse_embryo/rawdata/ref/tssannotation.bed', sep='\t', index=False)’
and fragment file is mm10 assembly version,
pipeline_name=cellranger-arc pipeline_version=cellranger-arc-2.0.1 reference_path=/bi/apps/cellranger-arc/references/refdata-cellranger-arc-mm10-2020-A-2.0.0
Is there a version issue with this?ArchR result is is provided by the article of the data source.
If you use the lastest ensembl gene annotation for mouse from Ensembl you have it for mm39
(GRCm39
) and not mm10
(GRCm38
):
$ pycistopic tss gene_annotation_list | grep -i Mouse
mspicilegus_gene_ensembl Steppe mouse genes (MUSP714)
mspretus_gene_ensembl Algerian mouse genes (SPRET_EiJ_v1)
mmusculus_gene_ensembl Mouse genes (GRCm39)
mpahari_gene_ensembl Shrew mouse genes (PAHARI_EIJ_v1.1)
mmurinus_gene_ensembl Mouse Lemur genes (Mmur_3.0)
mcaroli_gene_ensembl Ryukyu mouse genes (CAROLI_EIJ_v1.1)
pmbairdii_gene_ensembl Northern American deer mouse genes (HU_Pman_2.1)
Help for pycistopic tss gene_annotation_list
shows you the link to archived releases: https://www.ensembl.org/info/website/archives/index.html
pycistopic tss gene_annotation_list -h
usage: pycistopic tss gene_annotation_list [-h] [-f FILTER] [-s BIOMART_HOST] [--no-cache]
options:
-h, --help show this help message and exit
-f FILTER, --filter FILTER
Only keep list of Ensembl BioMart gene annotation names that contain specified string.
Ensembl BioMart:
Ensembl BioMart server settings.
-s BIOMART_HOST, --server BIOMART_HOST
BioMart host URL to use. Default: "http://www.ensembl.org". Archived Ensembl BioMart URLs: https://www.ensembl.org/info/website/archives/index.html (List of currently available archives).
--no-cache Disable caching of requests to Ensembl BioMart server.
Ensembl release 102 (Noverber 2020) is the lastest release with mm10
(GRCm38
): http://nov2020.archive.ensembl.org/index.html
Use that archive subdomain as server name and as you can see for mmusculus_gene_ensembl
you now get GRCm38
:
$ pycistopic tss gene_annotation_list --filter mouse --server http://nov2020.archive.ensembl.org/
mspicilegus_gene_ensembl Steppe mouse genes (MUSP714)
mmurinus_gene_ensembl Mouse Lemur genes (Mmur_3.0)
pmbairdii_gene_ensembl Northern American deer mouse genes (HU_Pman_2.1)
mpahari_gene_ensembl Shrew mouse genes (PAHARI_EIJ_v1.1)
mmusculus_gene_ensembl Mouse genes (GRCm38.p6)
mcaroli_gene_ensembl Ryukyu mouse genes (CAROLI_EIJ_v1.1)
mspretus_gene_ensembl Algerian mouse genes (SPRET_EiJ_v1)
Create mm10 TSS BED file with UCSC chromosome names (starting with chr
):
$ pycistopic tss get_tss --name mmusculus_gene_ensembl --server http://nov2020.archive.ensembl.org/ --ucsc mm10 --to-chrom-source ucsc --output mm10.tss.bed
- Get TSS annotation from Ensembl BioMart with the following settings:
- biomart_name: "mmusculus_gene_ensembl"
- biomart_host: "http://nov2020.archive.ensembl.org/"
- transcript_type: ['protein_coding']
- use_cache: True
- Getting chromosome sizes and alias mapping for "mm10" from UCSC.
- Update chromosome names in TSS annotation to "ucsc" chromosome names.
- Writing TSS annotation BED file to "mm10.tss.bed".
Thanks for your Thank you for your detailed response. I've realized that my TSS version is not consistent, and I will go ahead and modify this step. Is the pycistopic tss command available in version 2.0? It seems that the 1.0 version I previously installed does not support this command. Please forgive me as I am a beginner in using the shell.
The git version has it. Also the QC calculation is now in the command line:
https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html#QC
Hello, I met a problem in pycistopic pipeline, When I did QC for fragment file, I got a very low TSS enrichment score(less than 2) with other metrics normal. And I successfully ran all the step of pycistopic and pycistarget using the passing-RNA QC cells, I lost information on ATACseq QC (yet I get a normal TSS enrichment score in ArchR pipeline). I dont know which step caused this problem, and I hope you can give me some suggestion. Thanks for your reply!