UW-GAC / GENESIS

GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
https://bioconductor.org/packages/GENESIS
34 stars 13 forks source link

High Memory Consumption with Large Dataset in PC-Relate #102

Open ecyeh opened 1 year ago

ecyeh commented 1 year ago

Hello,

I'm currently using GENESIS for a large-scale genetic analysis involving approximately 500,000 samples. However, I've found that running PC-Relate with only 160,000 samples already exceeds my system's 1 TB memory capacity.

Given these memory demands, I'm wondering if there are any strategies or planned updates to reduce the memory footprint of PC-Relate. I'm also interested in any recommended approaches for handling such large datasets with GENESIS. Any advice or guidance would be greatly appreciated. Thank you for your time and assistance.

Best Regards, Erh-Chan

smgogarten commented 1 year ago

You can run the various steps of pcrelate separately on smaller batches of samples and then combine the results. See #38 for a description of how to do this.