datamol-io / splito

Machine Learning dataset splitting for life sciences.
https://splito-docs.datamol.io/
Apache License 2.0
23 stars 2 forks source link

Unpinned dependencies #10

Closed SteshinSS closed 7 months ago

SteshinSS commented 7 months ago

At the moment, neither the dependencies in project.toml nor the developer dependencies in env.yaml are pinned. This might create subtle discrepancies between developers and the pip package, leading to unreproducible bugs and software rot.

Solution: Pin the dependencies. Install the environment, run pip freeze, and pin the installed versions in the .toml and .yaml files.

I like pip-tools, but I haven't used it for .toml and conda environments.

Alternatively, it could be postponed, e.g. until a stable release, but we need to be aware of these subtleties.

cwognum commented 7 months ago

This is intentional!

splito is not meant to be used in isolation. You can imagine that if you want to install splito in an environment that has other packages, any pinned package could quite easily lead to a dependency conflict.

For that reason, we want to limit the constraints we place on the dependencies of splito. It is true that this complicates our lives as developers because we need to make sure we remain compatible and that we need to be weary about subtle discrepancies between versions, like you said.

I am not familiar with pip-tools, but we might start using pixi sometime soon which seems to have some similar features.

Does that make sense?

SteshinSS commented 7 months ago

Aha, I see. Indeed, pinning dependencies could cause conflicts with other packages. On the other hand, it would make the splits reproducible, which may not be the case right now. Anyway, I respect that this decision was intentional and will close the issue. I just wanted to highlight it, because I often couldn't reproduce scientific repositories (models, not packages) due to this issue.

cwognum commented 7 months ago

I think the split should be reproducible, as long as we are careful about setting the random seed! It could be that dependencies are updated without respecting backwards compatibility, but with the mature nature of most packages we use I think this is unlikely.