aphp / eds-scikit

eds-scikit is a Python library providing tools to process and analyse OMOP data
https://aphp.github.io/eds-scikit
BSD 3-Clause "New" or "Revised" License
35 stars 5 forks source link

Give a feature to avoid loading all subsetted cohort in memory #36

Closed strayMat closed 1 year ago

strayMat commented 1 year ago

Feature type

utility

Description

We would like to avoid loading all data in memory persisting HiveData. Especially usefull when processing big cohort, where the actual code will blow memory.

Several proposition/directions for amelioration:

strayMat commented 1 year ago

https://github.com/aphp/eds-scikit/pull/37 has implemented something close to solution 1. It still collect the data but dataframe by dataframe and parquet fragments by parquet fragments.