farrellja / URD

URD - Reconstruction of Branching Developmental Trajectories
GNU General Public License v3.0
117 stars 41 forks source link

Where is meta.axial.txt in artical #33

Closed Yichel518 closed 5 years ago

Yichel518 commented 5 years ago

Hi, My original intention was to learn how to reconstruct the trajectory of cell development, so I downloaded your public data (SRP124289). However, these fastq files on offer have already been processed. The read lengths are variable because they’ve already been polyA trimmed and adapter trimmed, and I can't do anything with the fastq file I have downloaded. So I can only download your expression matrix, but we all know that when using URD, similar to testdata, you need two input files (count.axial.txt and meta.axial.txt). Now I only downloaded to count.txt. Where can I get the meta.axial.txt of the article data?

farrellja commented 5 years ago

meta.axial.txt is in the github repository in the same directory as count.axial.txt: https://github.com/farrellja/URD/tree/master/Analyses/QuickStart/data


The original BAM files including cell & molecular tags are indeed uploaded in GEO. Some of the tools from SRA strip them during the download. Instructions to get them while maintaining those tags are listed below. These should include the UMI & cell barcode data in the XM: and XC: tags. The reason that I encourage people to use them instead of the original read files is that the cell barcodes have been corrected (can see Drop-seq cookbook for more details) due to imperfect cell barcode synthesis during the manufacture of the Dropseq beads. Thus, so that other analyses using our data could be compared to ours cell-for-cell, it’s best to use the BAM files because those have the ‘corrected’ barcodes. The drop-seq tools can be used to extract a FASTA file for use in aligner from the BAMs and also to merge the new alignments with the cell and molecular barcodes in the posted BAMs.


To download BAM files without stripping the cell and molecular barcodes, you can go to the SRA Run Selector at: https://www.ncbi.nlm.nih.gov/Traces/study/?go=home Search for GSE106474 or SRP124289 and click on an 'SRRnnnnnn' Run accession number. The link to download the BAM file is near the bottom of the page.

Yichel518 commented 5 years ago

Hi @farrellja Thanks for your guidance, I have already prepared all the bam files. I don't understand a bit. I would like to ask whether your BAM file is unaligned tagged uBAM or aligned.sorted.bam. Yes, I am now struggling with where should I start? I hope that you can lend a helping hand to me again. Many thanks! Regards, Yichel

farrellja commented 5 years ago

@Yichel518 The BAM files are from the end of the Dropseq pipeline — they have been trimmed, aligned, tagged, and had their bead synthesis barcodes corrected. If you want to re-align them, you can use the Drop-seq tools — you should be able to generate the FASTQ file (SamToFastq), align it however you want against a different reference, then sort (SortSam), merge it with the original BAM (MergeBamAlignment, use Attributes_To_Retain parameter to copy over the XC and XM barcodes without copying over the alignment or exon tagging), then TagReadWithGeneExon. You will not need to quality filter / polyA trim / adapter trim, as that was already done. You also should not re-run DetectBeadSynthesisErrors (it has already been done, and the point of using our BAMs is to keep the cell barcodes the same so that cell IDs between studies that use the data are consistent.)