TARGENE / targene-pipeline

Nextflow pipeline for Targeted-Learning of genetic effects
MIT License
11 stars 0 forks source link

Dataset Handling and coercion #160

Closed olivierlabayle closed 4 months ago

olivierlabayle commented 5 months ago

Need to have a better management of the dataset and scientific types coercion.

In particular it needs to be explicit and parameterizable. Probably in TMLE.jl.

[ ] For coercion, because each column must be processed based on its role in the "causal graph" I can think of multiple levels:

  1. Based on estimand dispatching on required nuisance functions
  2. Based on nuisance functions
  3. Make sure they agree if a variable is found in multiple such functions

[ ] For the dataset management.

  1. A copy of the relevant coerced data will be forwarded through the estimation procedure
  2. It is still unclear how the cache should be optimally handled by users (think security)
olivierlabayle commented 5 months ago

This has been partly handled now, now the main question is whether full dataset management should be ported to TMLE.jl. In the end it seems to be a simple coerce!(dataset, autotype(dataset)) typeof thin, potentially restricted to columns defined by an estimand.