JuliaPackaging / Scratch.jl

Scratch spaces for all your persistent mutable data needs
MIT License
58 stars 11 forks source link

More powerful linking of scratch spaces to versions #23

Open oxinabox opened 3 years ago

oxinabox commented 3 years ago

I don't think this is important stuff, but just noting them down.

Consider the instructions for making scratch spaces that are keyed to the version. This is kind of limitted

let's say my package does some data fetching and processing and then exposes it to the user. It uses the sratch space to hold the processed data, so it doesn't redownload it.

Now lets say different versions of my package does different processing. So you don't want to use a file that was processed by v1 of the package with v6 because .e.g. we have changed from representing time as a DateTime to using a ZonedDateTime. So it should redownload the file. So keying a subfolder to the version gets us some of that. but there are some problems

Deleting

So this could be quite a lot of data stored in the v1 sub-folder. and it would be good if it was deleted when uninstalled v1 of the package. For this we kind of need the package manager to know about the folder

clear_scratchspaces!(pkg) is a decent approximation for a lot of use cases

Compatibility

Consider a more advanced version of the above. What if while between v1 and v6 the time representation changed, but some files don't actually have any time data, so actaully remain compatiable.

The way we (Invenia) have solved this before is by giving seperate version numbers to the Data and the the Program, and then having the Program declare which versions of the data is is happy to use via semver_spec. You can see one used of this in JLSO which declared which file versions it can read and which it can write. We have a more intense version of this for interally with some datadeps.

So if both the scratch (sub)-space and the package have seperate versions, the package just declares which versions it can work with and then goes and find the newest that exists right now meeting that semver_spec it said it could work with. You can do this at the function level. (pretty much what our internal project on datadeps does)

But it would be cooler if the package manager was aware of it, so that it could tie in to automatic deleting

cossio commented 2 years ago

Wouldn't the second more advanced example be solved if you just wrap the data in a thinner package of its own? Then the processing package depends on the data package, and you get all the semver compat stuff from Pkg automatically.

A related question: At the moment, if I create a version-specific scratch-space (https://github.com/JuliaPackaging/Scratch.jl#can-i-create-a-scratch-space-that-is-not-shared-across-versions-of-my-package), will it be garbage collected automatically when that version of the package is removed (even though there is another version of the package installed through an update, say, with its own version-specific scratch that should not be removed)?

oxinabox commented 2 years ago

Yeah, I have lately been thinking about just automatically generation jll-like packages just for managing data

willow-ahrens commented 2 months ago

I would also like this feature, I have a compiler that I'm writing a caching mechanism for, and it doesn't really make sense to keep the compiled code for v0.4 when we are using v0.5 of the compiler now.