Closed mlondschien closed 6 months ago
I agree with you that this is urgently needed.
This will probably require a bit more thought and some more involved restructuring/refactoring. Would you have a proposal on how to go about this and/or resources to support the implementation of such a proposal?
Honestly, my R knowledge is too limited to be of technical help here.
Design-wise, I believe that updating existing data sources might be tricky if you want to keep backward compatibility.
@mlondschien I believe we briefly discussed this offline, but just to re-iterate: what you claim above is not true.
Adding a new data source requires changing the ricu source code (e.g., https://github.com/eth-mds/ricu/pull/30).
Data source definitions (as are concept definitions) are user configurable. Maybe the docs at ?ricu::load_src_cfg
can serve as starting point for this. We have used this in several places ourselves (and I'm sure @dplecko could share his experience here). Having external data sets and integrating them with the "core" ricu data sets is a central piece of functionality offered by ricu. If you would want to use a newer version of one of ricu's data sets you can use this alongside the older version or in its place.
Thanks @nbenn! Some guidance on how to do this, optimally an example, would be great.
?ricu::load_src_cfg
mimic2
, make sure (via the class_prefix
entry, objects derived from this config inherit from the respective "mimic" classes; copy the mimic config ricu ships and change where necessarydownload_src()
, import_src()
and attach_src()
; most of this should work out of the box via inheritance; maybe check out docs here tooload_difftime()
, load_src()
, etc. work, but I believe they should, again via inheritance and assuming that there will not be changes between versions that affect the logic hereI don't know in what ways newer mimic versions differ from what we had ~3 years ago. I'm assuming it's mostly adding some table(s), adding/removing patients. Maybe they dropped CareVue patients? That would simplify concept specification as fewer items have to be considered, but again seems largely backwards compatible. From a distance, I'm hoping that these changes do not really require changing mimic-specific logic but can mostly be handled by config.
Maybe also check in with @prockenschaub as he's working on adding new mimic to ricu I believe.
I created a (temporary) repository that collects the all the different databases configs, including a short description of how to load them.
@mlondschien maybe you can have a look and check that it works for you. Long-term, I think it would be nice for backward compatibility to include those earlier versions with ricu
. Maybe we can also turn the README in my repo into a proper vignette?
@nbenn while I agree that what Malte wants should already be possible, I think we need to provid more guidance for the user. From my experience, the learning curve of ricu
can be quite steep. I think shipping configs for each version of MIMIC IV could already go a long way of easing people into it and also highlighting what parts of the config need to be changed to achieve what. load_difftime()
etc. thankfully have not been affected by version changes so far. What may change is concepts, though, so a systematic way to not copy paste but inherit and, where necessary, overwrite concepts for new database versions may make sense.
I have discussed this with @nbenn. While I can see that having dataset versioning would be useful, this would create quite a lot of overhead in terms of re-structuring things. Therefore, we will not prioritize this currently, given the fact that other things have a better value/time invested ratio. Instead, we have opted for a different solution: we will try to be up-to-date with the "latest" version of each dataset.
For cases in which an older version of a dataset is needed, one may add this dataset locally. If you think examples for how this could be done would be helpful, please feel free to open a new issue. Adding local datasets is described to some degree here.
Related: https://github.com/eth-mds/ricu/issues/33, https://github.com/eth-mds/ricu/issues/28.
The links (and thus versions) of data sources are currently hardcoded in the data-sources.json. This makes
ricu
unnecessarily inflexible. Adding a new data source requires changing thericu
source code (e.g., https://github.com/eth-mds/ricu/pull/30) as does benefitting from the data source's updates (e.g.,caregiver_id
was added to miiv tables in v2.2, which is currently not accessible viaricu
).