JuliaLinearAlgebra / MatrixDepot.jl

An Extensible Test Matrix Collection for Julia
http://matrixdepotjl.readthedocs.org/
Other
75 stars 22 forks source link

Use Scratch.jl for default data dir? #73

Closed willow-ahrens closed 3 years ago

willow-ahrens commented 3 years ago

I almost always use a custom data directory for MatrixDepot (my supercomputer filesystem for scratch data) because I don't want to redownload the matrices when the package updates, and I need to ensure the data path is writable. However, there is now a more automatic solution for this kind of problem, Scratch.jl. I think that Scratch.jl may provide a more useful default data directory which persists across updates to the package.

Currently, the default is in the package itself set on line 140 of "data.jl"

const DATA_DIR = abspath(dirname(@__FILE__),"..", "data")

I think it would make a lot of sense to change that line to

const DATA_DIR = @get_scratch!("data")
willow-ahrens commented 3 years ago

I'll also add that I do NOT think MATRIXDEPOT_MYDEPOT should default to a Scratch.jl directory. From what I understand, MATRIXDEPOT_MYDEPOT might be better off without a default, because defaulting to a location inside the package itself encourages users to write files to a location managed by the package manager. Paths inside the package directory might not be writable when user code runs, or might get deleted.

For a more permanent and extensible solution, custom depot code may be better handled by simply telling users to call "include_generator" from outside of the package at runtime, and by deprecating the current code-copying solution. When the user wants to define a package of custom matrix generators called MyFancyMatrices, they would of course need to call include_generator from the __init__ function in the MyFancyMatrices module, to ensure that updates to usermatrixclass occur at runtime and don't get precompiled. Such a solution has two benefits:

  1. Adding new generators would be done by including packages. All the user needs to do is using MyFancyMatrices
  2. Another user can come along and define an ExtraFancyMatrices package, and the two could work together.

As far as I can tell, the solution I have just described already works, and all that needs to change is documentation and a deprecation to the existing code loading system.

KlausC commented 3 years ago

I almost always use a custom data directory for MatrixDepot

You can use the environment variable "MATRIXDEPOT_DATA" for that purpose. "Scatch.jl" requires juliav1.5 afaik; will be an option for the future.

For a more permanent and extensible solution,

Please feel free to provide a PR with your proposed changes wrt MYDEPOT.

willow-ahrens commented 3 years ago

Thanks for your quick response, and for your maintenance of a very useful package!

My goal in filing this issue was to contribute a PR which did not default to writing files inside the MatrixDepot package directory, for both MATRIXDEPOT_MYDEPOT and MATRIXDEPOT_DATA. I understand wanting to avoid a dependency on Scratch for now. In the meantime, what do you think about setting a default similar to that of Scratch? This would confer the same benefits regarding persistence across MatrixDepot versions and Julia versions. Internally, Scratch defaults the scratchspace to a subdirectory of Base.DEPOT_PATH, an environment variable that was introduced before 1.0. Thus, the default could be something like

const DATA_DIR = abspath(first(Base.DEPOT_PATH), "matrix_depot_data")

Alternatively, what do you think about defaulting to @getscratch! only when Scratch is loaded using Requires.jl?

KlausC commented 3 years ago

Both is possible in combination. I am waiting for your PR.

Some issues: We must provide a smooth migration path for people relying on the current defaults. (+ Docu) Don't forget to create the new directory in __init__, if required. const DATA_DIR = Ref(--default value if no Scatrch available--) to be overrun in __init__ if Scratch is available.

willow-ahrens commented 3 years ago

I've implemented a rough draft of the proposed changes, but ran into a hiccup using Requires to include Scratch. When Scratch is loaded after MatrixDepot, we need to re-run __init__ using the new data directory. This is messy, and since we cannot control load order, it seems unavoidable with a Requires-based approach.

Unless we can explicitly depend on Scratch or initialize lazily, I think it would be better to exclusively use the Base.DEPOT_PATH default.

In the interest of avoiding putting MatrixDepot users through two changes to the defaults (to Base.DEPOT_PATH and then eventually to @get_scratch!), I asked the Scratch developers whether they think it might be possible to support an earlier version of Julia. In my own tests, I think it may at least be possible to support Julia 1.3. For how long would you like MatrixDepot to support Julia 1.0?

willow-ahrens commented 3 years ago

Because of the complications and effort involved in any other solution, I decided it would be easiest and least confusing to users if we simply wait until MatrixDepot.jl is comfortable with a dependency on Julia 1.5 or 1.6, which is likely to be the next LTS, and then add Scratch as a normal dependency. Note the discussion in https://github.com/JuliaPackaging/Scratch.jl/issues/22.

andreasnoack commented 3 years ago

I'm in favor of bumping the requirement to Julia 1.5. Bumping the bound doesn't mean that users of Julia <1.5 cannot use MatrixDepot. It just means that they cannot use the new features.

willow-ahrens commented 3 years ago

It looks like Scratch is now compatible with Julia 1.0, so these changes no longer require a Julia version requirement bump. https://github.com/JuliaPackaging/Scratch.jl/pull/24

KlausC commented 3 years ago

Good news! Ass soon as Scratch publishes the new release, I would be happy to see you adapt your PR #75 .

andreasnoack commented 3 years ago

It has already been released https://github.com/JuliaRegistries/General/pull/38498

willow-ahrens commented 3 years ago

I relaxed the compat entry for julia back to "1" in my pull requests. I'm awaiting feedback on https://github.com/JuliaMatrices/MatrixDepot.jl/pull/75 and https://github.com/peterahrens/MatrixDepot.jl/pull/1

KlausC commented 3 years ago

75 has been merged just now.

willow-ahrens commented 3 years ago

Since I wasn't able to merge https://github.com/peterahrens/MatrixDepot.jl/pull/1 into #75 before #75 was merged, I have opened it as a separate PR on master as https://github.com/JuliaMatrices/MatrixDepot.jl/pull/79

KlausC commented 3 years ago

That is as I had expected. #75 should treat the Scratch subject, and #79 the user defined generators. As these are unrelated, each of these should be applicable independent of the status of the other one.