gav-sturm / Cellular_Lifespan_Study

A repository of code for data processing and analysis of data generated for the Cellular Lifespan Study performed at the Mitochondrial Psychiobiolgy Lab at Columbia University Medical Center
GNU General Public License v3.0
4 stars 1 forks source link

Downloading all the pre-processed data from the site #1

Open qbeer opened 11 months ago

qbeer commented 11 months ago

Hi,

I'd like to acquire all the pre-processed data, from all phases, all treatments and all patients for further research. The site seems to collapse when I am doing this. Could you please point me to the datasets locations that is displayed on the shinyapp interface? Should I look at this folder: Seahorse/raw_data? Should I concatenate all the files together or how are they handled by the app?

Thanks in advance, Alex

gav-sturm commented 11 months ago

Hi Alex,

If you go to the "Download Data" section in the ShinyApp, with the "All parameters" pre-selected, then hit the box "Click to download the full sample-set (~2000k timepoints)" and then select "Download Selected Data" and save it to your 'Downloads' folder you should be able to get the full dataset. Have you tried that? Would love what selection is throwing your error.

If this does not work you should be able to further download the full dataset from FigShare at: https://doi.org/10.6084/m9.figshare.18441998.v2 (2022).

Gabriel

qbeer commented 11 months ago

Hi Gabriel,

I haven't found the full Figshare dataset so thanks for pointing to that. I think I was trying to select all patients, all phases and all treatments, clicking the box to download the full dataset (~2000k timepoints), but the browser froze, maybe I am just running out of RAM, but couldn't acquire the whole dataset this way. In the figshare data, if I understand correctly, one row, is one time-point, but the CSV seems to have only 2000 timepoints and not 2 million. Do I misunderstand sth? Thanks,

Alex

qbeer commented 11 months ago

I think it is just a typo on the site. It seems that there are ~2000 data points, with additional modalities for some of the measurements at different time periods. Do you might have code that joins every data modality together into a single, unified data table or could you point out to me exactly which files are necessary to join together if I'd like to integrate the methylation, RNASeq, etc. data into the full CSV file?

Thanks in advance, Alex