buenrostrolab / slide_dna_seq_analysis

Analysis of slide-DNA-seq data in Zhao, Chiang, et al.
3 stars 1 forks source link

Question about slidedna_alignment #6

Closed liuyihhha closed 2 years ago

liuyihhha commented 2 years ago

Dear slide-DNA-seq developers, Thank you so much for your excellent work. Recently I have been working on some Slide-DNA-seq data alignment. But there are still some problems that are still disturbing me.
As you mentioned, I downloaded the data (human_colon_cancer_3(SRR16203712)) from s3 to ensure I got the correct fastq raw data, which R1's length is 35, R2's length is 14, R3's length is 35. Then I try to use slidedna_alignment_template.sh. In step 2, we have to run extractBCfromR2.py. But R2's length is already 14. So I think I can skip step 2. But when I finished the alignment program, I found a big gap between the final barcode file ( barcode.list ) and the beads' location file( human_colon_cancer_3_dna_191204_19.bead_locations.csv ). I only got 1461 barcodes. Does the slidedna_alignment_template.sh work on the raw fastq download from S3?

Thank you.

zchiang commented 2 years ago

Hi @liuyihhha,

Thanks for your interest in our data, and great question!

You are correct that the barcode extraction step has already been run on most (if not all) of the FASTQs we uploaded. Sorry for not making this more clear.

I am not sure why you are seeing the discrepancy between the barcode file and the beads file. Perhaps you can post more information about the number of reads you have at each step?

Alternatively, if you're just interested in the aligned BAM files, you can download them here: https://drive.google.com/drive/u/1/folders/1JnOt995Cpy1Ya3g8DcuJubqlmTqV5sZr

Best, Zack

liuyihhha commented 2 years ago

Thank you for your response.

I extracted 9479276 unique reads from read2 (human_colon_cancer_3_dna_191204_19_R2.fastq.gz), 41181 reads from the beads file(human_colon_cancer_3_dna_191204_19.bead_locations.csv). When I intersected two sets, I found only 1461 overlap. ( I think the role of R2 is to put the barcode information into the read name, so maybe check R2 can reflect the total number of barcodes, and R2 should contain all barcodes in the beads file ). So I'm confused.

And the aligned BAM files are very helpful to me.

Thanks.