CogStack / MedCAT

Medical Concept Annotation Tool
Other
456 stars 105 forks source link

CU-8694vv985 transitive deps #463

Closed mart-r closed 4 months ago

mart-r commented 4 months ago

Add all transitive dependencies to the environmental snapshot.

We added transitive dependencies to MedCAT in v1.12 (#438). However, that only included direct dependencies of MedCAT. In principle, the compatible versions of transitive dependencies should be handled by the package that defines them. However, these things can change over time and there may be a situation where the requirements ranges previously set no longer work properly. As such, by including all the transitive dependencies (and their versions) in the environmental snapshot could help us recreate an environment where we'd expect a model to work more confidently.

There's a few decisions I made that could very well be open to change:

  1. Currently, the transitive dependencies are in a separate part of the json
    • I thought it would be good to explicitly separate them from direct dependencies
    • Though if there's good reasons to keep all dependencies together, it could be changed
  2. Currently, the default behaviour includes transitive dependencies
    • It shouldn't be too computationally expensive
    • But if there's a good reason, we can have it disabled by default

The change to the default environmental snapshot as a result of the PR is as follows:

tomolopolis commented 4 months ago

Task linked: CU-8694vv985 Add more dependencies to dependency snapshot

mart-r commented 4 months ago

env snapshotting happens as part of model pack creation right? should be fine to include t-dep computation, as hash computation is most time consuming

Yes, that's when it happens. Not much harm, really. Even if the faster hashing is used (which nearly instant), saving the CDB on disk will take far longer than the fractions of the second finding the transitive dependencies would take. (though haven't looked at the exact timings).

Does this change integrate with the 'notification' of loading a model pack that is not the 'same' environment?

I don't think there's currently anything in MedCAT that does that. spaCy does warn when loading an older version of their model that's technically not (guaranteed to be) compatible with the current spaCy version. But I don't recall us checking the environment in any meaningful way.