Implement the necessary data structures for training

bioFAM / cellij

Implementation of a Modular Multi-Omics Factor Model Framework

BSD 3-Clause "New" or "Revised" License

3 stars 0 forks source link

Implement the necessary data structures for training #8

Open arberqoku opened 1 year ago

arberqoku commented 1 year ago

Convert the preprocessed and clean MuData into a pytorch.Dataset wrapped into a pytorch.DataLoader to facilitate training during inference, e.g. when introducing mini-batching for SVI. Keep in mind sample-/feature-wise metadata stored in .obs and .var fields.

gtca commented 1 year ago

As discussed, we could also drag it upstream e.g. to mudata, in future or straightaway.

Another point of discussion can be enabling different strategies when MuData is not a Tensor (e.g. union with missing data or imputation, intersection).