frisen-lab / TREX

Simultaneous lineage TRacking and EXpression profiling of single cells using RNA-seq
MIT License
5 stars 6 forks source link

Reproducing the cloneID raw matrix #1

Closed cnk113 closed 2 years ago

cnk113 commented 2 years ago


I'm trying to reproduce the clonal barcode matrix from your paper, and I was wondering if the cellranger annotation is specifically modified? I assume you add the chr-Tomato.fa to the end of the mouse genome and I was wondering what was appended to the GTF file?

Best, Chang

marcelm commented 2 years ago

(Please ignore my previous comment that I have now deleted.)

The chrTomato-N.fa file that you find in this repository was not used in the paper. We use it only for testing the pipeline.

I have now added both the FASTA and GTF files that we used, see Please also see the updated README file. I’m copying here the part that I added just now. If you need more details, please let us know.

Reproducing results from the Nature Neuroscience paper

See the manuscript at This section lists some details not available in the paper. See the annotations/ directory for the necessary files.

The FASTA reference used with CellRanger was created by appending H2b-EGFP-30N-LTR.fa to the end of the GRCm38 (mm10) reference FASTA:

cat genome.fa chrH2B-EGFP-N.fa > mm10_H2B-EGFP-30N_genome.fa

The GTF annotations file was created by appending chrH2B-EGFP-N.gtf to the existing annotations:

cat genes.gtf chrH2B-EGFP-N.gtf > mm10_H2B-EGFP-30N_genes.gtf

Then a new CellRanger reference can be created:

cellranger mkref --genome=mm10_H2B-EGFP-30N --fasta=mm10_H2B-EGFP-30N_genome.fa --genes=mm10_H2B-EGFP-30N_genes.gtf > mkref_mm10_H2B-EGFP-30N.out
cnk113 commented 2 years ago

Thanks for the clear solution!

cnk113 commented 2 years ago

I got everything working, I was wondering in terms of the parameters in run10x, it's fine to run everything with the defaults to get similar results with the paper?

micratos commented 2 years ago

Hi Chang,

To obtain the cell x barcode UMI matrix for each brain (consisting of three 10X libraries from three brain regions) I usually run:

trex run10x --output brain1 --start 1122 --end 1151 --umi-matrix --prefix --filter-cellids filtered_cells.csv cortex hippocampus striatum

The filtered_cells.csv contains the cellIDs exported from Seurat.

Hope this helps!


cnk113 commented 2 years ago

Thank you!