Closed HeidiOttesen closed 2 years ago
Hi Heidi,
Thanks for your interest in our data! To answer your questions:
Hope this helps, please let me know if anything is unclear or you have further questions! Zack
Hi again. Thank you for addressing all my queries, very helpful. Just to clarify the last question - I was thinking more the opposite - not that one fragment hits two different bins but that several different fragments (short ~100bp) could hit the same bin (1Mb) - leading to a higher count. But if I understand your reply correctly, only the left-most unique fragment qualifies as a count then?
Do you have any theories to the high outliers? I found counts up to 506 in the mouse liver met2 dataset. CNAs/misalignments/bias/technical errors?
Thank you again! Kind regards Heidi
I believe you are understanding my reply correctly, but will write out a quick explanation just to make sure:
In general, we think the higher/lower bin counts can be explained by a combination of real biological signal (whole chromosome duplications/deletions in this case) and the known sequence biases of our assay (see Supp Fig. 5). There are two outlier bins in the mouse liver met data -- chrM, which is obviously due to there being many more copies of the mitochondrial genome, and chr2 9.8-9.9 Mb. We have not followed up to see why there are so many reads aligning to this chr2 bin, but we exclude it from downstream analysis in the bin selection step before PCA.
This is great. Thank you so much!
Hi guys.
I am working on my master thesis studying slideseq data. So thank you for all your work on this and for this repository! I've got a question about the published "*.sparse_counts_1Mb.txt" - DNA files. What do they actually represent here?
Tried getting an idea from the preprocess script but wanted to check with you.
Hope to hear from you
Best wishes Heidi