frisen-lab / TREX

Simultaneous lineage TRacking and EXpression profiling of single cells using RNA-seq
MIT License
5 stars 6 forks source link

Reproducing the cloneID raw matrix #1

Closed cnk113 closed 2 years ago

cnk113 commented 2 years ago

Hello,

I'm trying to reproduce the clonal barcode matrix from your paper, and I was wondering if the cellranger annotation is specifically modified? I assume you add the chr-Tomato.fa to the end of the mouse genome and I was wondering what was appended to the GTF file?

Best, Chang

marcelm commented 2 years ago

(Please ignore my previous comment that I have now deleted.)

The chrTomato-N.fa file that you find in this repository was not used in the paper. We use it only for testing the pipeline.

I have now added both the FASTA and GTF files that we used, see https://github.com/frisen-lab/TREX/tree/main/annotation. Please also see the updated README file. I’m copying here the part that I added just now. If you need more details, please let us know.

Reproducing results from the Nature Neuroscience paper

See the manuscript at https://doi.org/10.1038/s41593-022-01011-x. This section lists some details not available in the paper. See the annotations/ directory for the necessary files.

The FASTA reference used with CellRanger was created by appending H2b-EGFP-30N-LTR.fa to the end of the GRCm38 (mm10) reference FASTA:

cat genome.fa chrH2B-EGFP-N.fa > mm10_H2B-EGFP-30N_genome.fa

The GTF annotations file was created by appending chrH2B-EGFP-N.gtf to the existing annotations:

cat genes.gtf chrH2B-EGFP-N.gtf > mm10_H2B-EGFP-30N_genes.gtf

Then a new CellRanger reference can be created:

cellranger mkref --genome=mm10_H2B-EGFP-30N --fasta=mm10_H2B-EGFP-30N_genome.fa --genes=mm10_H2B-EGFP-30N_genes.gtf > mkref_mm10_H2B-EGFP-30N.out
cnk113 commented 2 years ago

Thanks for the clear solution!

cnk113 commented 2 years ago

I got everything working, I was wondering in terms of the parameters in run10x, it's fine to run everything with the defaults to get similar results with the paper?

micratos commented 2 years ago

Hi Chang,

To obtain the cell x barcode UMI matrix for each brain (consisting of three 10X libraries from three brain regions) I usually run:

trex run10x --output brain1 --start 1122 --end 1151 --umi-matrix --prefix --filter-cellids filtered_cells.csv cortex hippocampus striatum

The filtered_cells.csv contains the cellIDs exported from Seurat.

Hope this helps!

Michael

cnk113 commented 2 years ago

Thank you!