SciML / ADTypes.jl

Repository for automatic differentiation backend types
https://sciml.github.io/ADTypes.jl/
MIT License
38 stars 11 forks source link

Restructure sparse backends and replace subtyping by traits #40

Closed gdalle closed 5 months ago

gdalle commented 5 months ago

Motivation

This is a breaking PR (bumping version to v0.3.0) that restructures the handling of sparse backends, and cleans up the package in the process.

TLDR: Most (concrete) structs have not changed too much, but

  • their abstract subtypes are replaced by a mode trait
  • their sparse counterparts are replaced by the AutoSparse struct

Breaking API changes

Mode

Concrete structs

Non-breaking API changes

Sparsity

Other changes

Dependencies

Package quality

Checklist

Questions

ChrisRackauckas commented 5 months ago

Something not mentioned in the top there is that this will require that ArrayInterface.jl take on an ADTypes.jl dependency, which is fine given that this is lightweight and all, but it should be noted that its ColoringAlgorithm needs a @deprecate ColoringAlgorithm = ADTypes.AbstractColoringAlgorithm which is then used in downstream implementation of its functions. fast_matrix_colors and matrix_colors live there since they are part of the extended sparse matrix interface, so that array types can define their coloring pattern overloads (things like BandedMatrices) without taking a full dependency on the AD libraries, but then specialize whenever such structured matrices are hit with the coloring algorithm (ex. BandedMatrices has a nice analytical solution based on the band size).

ChrisRackauckas commented 5 months ago

The symbols instead of abstract types thing is weird. That needs some explanation.

gdalle commented 5 months ago

Thanks for the review! Indeed you identified two parts where I was unsure:

The symbols instead of abstract types thing is weird. That needs some explanation.

See my comment https://github.com/SciML/ADTypes.jl/pull/40#discussion_r1563914494

will require that ArrayInterface.jl take on an ADTypes.jl dependency

Either that or the coloring names are dropped from ArrayInterface altogether? See my comment https://github.com/SciML/ADTypes.jl/pull/40#discussion_r1563915898

gdalle commented 5 months ago

its ColoringAlgorithm needs a @deprecate ColoringAlgorithm = ADTypes.AbstractColoringAlgorithm

I can rename AbstactColoringAlgorithm to ColoringAlgorithm to minimize the refactoring work

fast_matrix_colors and matrix_colors live there since they are part of the extended sparse matrix interface,

Not sure how to deal with those though. I can also rename column_coloring and row_coloring but these are different algorithms, and I don't see a similar toggle in matrix_colors?

ChrisRackauckas commented 5 months ago

I can rename AbstactColoringAlgorithm to ColoringAlgorithm to minimize the refactoring work

It's at least contained to two repos, so it's not too big of a deal if it's deprecated properly.

Not sure how to deal with those though. I can also rename column_coloring and row_coloring but these are different algorithms, and I don't see a similar toggle in matrix_colors?

Yeah that's mostly by adjoint. But the bigger thing would be updating all of the downstream analytical solutions to have these functions as well.

gdalle commented 5 months ago

But the bigger thing would be updating all of the downstream analytical solutions to have these functions as well.

If we keep the name matrix_colors and import ADTypes into ArrayInterface, there is nothing to change at all. But that would mean having only one coloring function, and if I want the row coloring I take the column coloring of the transpose?

ChrisRackauckas commented 5 months ago

But that would mean having only one coloring function, and if I want the row coloring I take the column coloring of the transpose?

That's what we currently do, but if you're going to setup general sparse AD then it's not a good idea since you won't easily know if someone passes a coloring pattern if it should be A or A'. So I like the idea of two functions, but the work of updating all of the analytical solutions in the extensions to the two functions needs to be done as well.

gdalle commented 5 months ago

I like the idea of two functions, but the work of updating all of the analytical solutions in the extensions to the two functions needs to be done as well.

I agree. Maybe ArrayInterface can keep defining matrix_colors (based on ADTypes.column_coloring), and that way as a transition we can just copy-paste the following lines in the downstream packages:

ADTypes.column_coloring(M) = ArrayInterface.matrix_colors(M)
ADTypes.row_coloring(M) = ArrayInterface.matrix_colors(M')

without changing the names in the actual function definitions for the time being

gdalle commented 5 months ago

Thanks for the comments! I'll update the PR with a trait mechanism and ping you again for validation if that's okay?

ChrisRackauckas commented 5 months ago

I agree. Maybe ArrayInterface can keep defining matrix_colors (based on ADTypes.column_coloring), and that way as a transition we can just copy-paste the following lines in the downstream packages:

Let's do the clean break. Such a clean break is the kind of thing that needs someone motivated in order to say yes. I don't see you stopping the DI push, so let's do it right. v0.3.0 here to signal the break, update the downstream in ArrayInterface and SparseDiffTools, and get sparse DI to do things the best way possible, and then integrate it into everything like NonlinearSolve.jl and DifferentialEquations.jl. You've got gusto and we've been sitting here waiting for something like DI to finally fix some of these remaining issues, so let's just pull the trigger and do it.

codecov[bot] commented 5 months ago

Welcome to Codecov :tada:

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered :open_umbrella:

gdalle commented 5 months ago

Not ready yet but close

gdalle commented 5 months ago

@ChrisRackauckas I have implemented the trait mechanism and used the opportunity for other small breaking changes in the concrete structs. All of it is summarized in the first comment, which I have edited: https://github.com/SciML/ADTypes.jl/pull/40#issue-2239188179 What do you think?

gdalle commented 5 months ago

@avik-pal if you have an opinion I'm curious too

Vaibhavdixit02 commented 5 months ago

Should we document type parameters of the concrete structs? I feel like this limits us in the range of non-breaking changes we can do: for instance, turning AutoPolyesterForwardDiff{chunksize} into AutoPolyesterForwardDiff{chunksize,T} is breaking because someone might have used the former as a constructor. That is why I explicitly document keyword constructors and fields.

If we provide constructors for the previous version it can be done in a non-breaking way so it shouldn't be an issue I think

Is the weak dependency handling okay on Julia < 1.9? The alternative would be to avoid defining the mode trait altogether, which might be preferable

I think that's fine, handling <1.9 on CI and dep versions has been quite a pain already so I think we can bump julia compat 1.9 and do it the "right" way here

Should we pull in Diffractor.jl and Zygote.jl in the test suite to check on actual instances of the RuleConfig paradigm? I define very simple RuleConfigs in runtests.jl but it's less reliable than taking the real ones

Definitely, though I am not I understand why you are hesitant to do it so maybe I am missing something?

ChrisRackauckas commented 5 months ago

One step that we have not done yet but really need to consider in the API is structure identification. While it does not matter too much for the AD aspects, for the real usage of the sparse matrices it can be greatly beneficial to specialize away from SparseMatrixCSC to BandedMatrices or BlockBandedMatrices. So we really should have a step whereby if someone does sparsity detection, it should have a choice of whether to run a structure detection. There is a trade-off here since it's fundamentally dynamic and thus would have a dynamic type choice, but is probably something that is beneficial in "most" uses of sparsity detection. It would need hyperparameters though, for example maximum band size (since all matrices are "banded" if it's large enough), and the algorithm does not exist right now. But it would probably make sense to make some space for this to be a post-processing to sparsity detection

gdalle commented 5 months ago

If we provide constructors for the previous version it can be done in a non-breaking way so it shouldn't be an issue I think

At this point we're breaking so much that I think people who weren't using the kwdef constructors will be able to adapt rather easily. Otherwise we'd have to break type parameter symmetry and define things like AutoPolyesterForwardDiff{chunksize}(tag), but that's not absurd either.

Definitely, though I am not I understand why you are hesitant to do it so maybe I am missing something?

Just the thought of a heavier test suite, and Diffractor being frequently broken.

One step that we have not done yet but really need to consider in the API is structure identification.

I think I'd rather keep the functions for sparsity detection simple, and encapsulate this "structure detection vs rely on type alone" in the chosen SparsityDetector object

Vaibhavdixit02 commented 5 months ago

Just the thought of a heavier test suite, and Diffractor being frequently broken.

Yeah the second point can get frustrating. Maybe we can keep this as it is for now and if it brings up unforeseen issues we'll add them to really test them here

ChrisRackauckas commented 5 months ago

I think I'd rather keep the functions for sparsity detection simple, and encapsulate this "structure detection vs rely on type alone" in the chosen SparsityDetector object

As long as there is a plan, since I assume DI will need to incorporate it about 1 year from now.

gdalle commented 5 months ago

For testing, while it's a hassle to add Diffractor and Zygote to the dependencies, I did perform the following check of my mode method definitions in the ChainRulesCore extension, so I'm reasonably confident:

julia> import Diffractor, Zygote

julia> using ADTypes  # on the PR branch

julia> ADTypes.mode(AutoChainRules(; ruleconfig=Zygote.ZygoteRuleConfig()))
ADTypes.ReverseMode()

julia> ADTypes.mode(AutoChainRules(; ruleconfig=Diffractor.DiffractorRuleConfig()))
ADTypes.ForwardOrReverseMode()

julia> @which ADTypes.mode(AutoChainRules(; ruleconfig=Zygote.ZygoteRuleConfig()))
mode(::AutoChainRules{RC}) where RC<:(ChainRulesCore.RuleConfig{>:ChainRulesCore.HasReverseMode})
     @ ADTypesChainRulesCoreExt ~/.julia/packages/ADTypes/bOklB/ext/ADTypesChainRulesCoreExt.jl:16

julia> @which ADTypes.mode(AutoChainRules(; ruleconfig=Diffractor.DiffractorRuleConfig()))
mode(::AutoChainRules{RC}) where RC<:(ChainRulesCore.RuleConfig{>:Union{ChainRulesCore.HasForwardsMode, ChainRulesCore.HasReverseMode}})
     @ ADTypesChainRulesCoreExt ~/.julia/packages/ADTypes/bOklB/ext/ADTypesChainRulesCoreExt.jl:22

Of course there is no ChainRules-compatible backend with a forward-only RuleConfig, so we'll never hit the third method

gdalle commented 5 months ago

I think we're there, the last few things to iron out are:

ChrisRackauckas commented 5 months ago

If we define the mode, we need extensions for ChainRulesCore and EnzymeCore (fairly lightweight). Should we

We should define mode and at least bump to v1.9. I'd say just go to v1.10 knowing it's the next LTS and skip all of the other pain, there's other stuff to worry about in life.

Should we bump ADTypes version to 1.0?

Yes

Should we leave JET in the test suite, even though it errors on nightly? It does a great job of catching typos, and I made it so that the badge would still be green if only nightly fails

We just shouldn't run nightly, it's not for humans. We can set the new prerelease branch when it's ready, but since nightly isn't for packages (which is why the prerelease is being made) it's a waste of our time to be trying it.

gdalle commented 5 months ago

I'd say just go to v1.10 knowing it's the next LTS and skip all of the other pain, there's other stuff to worry about in life.

Done.

We just shouldn't run nightly, it's not for humans.

Removed nightly from CI

gdalle commented 5 months ago

Once I get an approving review I'll merge, but I'd like to do two things before we register v1:

ChrisRackauckas commented 5 months ago

Yes no need to release right away, let's get downstream all set and make sure everyone is bought into this form.