ejarmand / comparative_epigenomic_motor_cortex

8 stars 2 forks source link

Fragment data for GSE229169 (10x multiome) #1

Closed Ambuj0507 closed 1 month ago

Ambuj0507 commented 2 months ago

Hi Team, I am looking for the fragment data generated and analyzed in your study "Conserved and divergent gene regulatory programs of the mammalian neocortex" (https://doi.org/10.1038/s41586-023-06819-6). Would it be possible to share fragment files for 10x multiome study GSE229169? Thanks

vitkl commented 1 month ago

Hi @ejarmand

Congratulations on publishing your very exciting and inspiring study! I would also greatly appreciate if you could share 10X cellranger-arc outputs for all 4 species (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE229169), in particular fragments.tsv.gz files. This would help users to use your multiome data without repeating 10X cellranger-arc analysis from raw data. If it would make it easier for you could upload this data to S3 storage or FTP or Google Drive - not necessarily public databases. Please let me know if this is possible!

ejarmand commented 1 month ago

Thanks for bringing this to my attention! Apologies for the oversight. I'm discussing with other others how to best share that data for the long term. In the mean time I'm happy to share the data directly via google drive or something to that effect. Shoot me an email to coordinate: earmand@ucsd.edu

ejarmand commented 1 month ago

Hi all, I'm hosting the data, I've put download links to tar archives of each species in this document https://docs.google.com/document/d/1lJeqTGfemE4bipx-cr9CU60C8MKJf1vDaXoNsCfzhYw/edit?usp=sharing.

The links will work for the next too weeks, but I'll update as needed until we find an alternative host. I'm currently hosting out of pocket, so I've set the google doc to require an access request to avoid accruing significant download costs on my end.

wawpaopao commented 1 month ago

I would like to ask for some guidance on how to obtain a peak*cell matrix from the fragments.csv file after downloading the data. I intend to use this matrix to train a neural network model. Since I am not familiar with bioinformatics analysis, I would appreciate if you could provide an overview of the process

vitkl commented 1 month ago

While I am not the author, I would encourage you @wawpaopao to read tutorials on this data processing step in detail (ArchR and SnapATAC2) - understanding this step and making good choices is as important as the "neural network model". https://www.archrproject.com/bookdown/calling-peaks-with-archr.html https://github.com/kaizhang/SnapATAC2