ekmett / ad

Automatic Differentiation
http://hackage.haskell.org/package/ad
BSD 3-Clause "New" or "Revised" License
372 stars 73 forks source link

Handle IEEE Floating Point Special Values #105

Closed julmb closed 1 year ago

julmb commented 1 year ago

This PR adds several changes primarily aimed at addressing issue #104.

Expand Test Suite

I added tests for the combinators diff, grad, jacobian, and hessian. These include both basic correctness tests as well as specific regression tests for issue #104. Some of the tests were taken directly from the package documentation. For instance, the module Numeric.AD.Mode.Reverse lists several examples for its combinators.

The test suite is currently instantiated for the modes Numeric.AD.Mode.Reverse and Numeric.AD.Mode.Reverse.Double but could easily be extended to cover the other modes. However, there are failing tests in some of the other modes and I am not sure if adjusting those should be part of this pull request. I am also not sure if/when I will have time to look at these, so the test suite is restricted to Numeric.AD.Mode.Reverse and Numeric.AD.Mode.Reverse.Double for now.

It is worth noting that with the current setup, it is not possible to test both the pure and the +ffi version of Numeric.AD.Mode.Reverse.Double at the same time, since they are implemented within the same module and activated via preprocessor switch. This is especially troublesome since the trickier +ffi version is not active by default. It would be nice if the +ffi version could be its own module, but I do not know the rationale behind the original preprocessor switch design.

Adjust +ffi Version of Numeric.AD.Mode.Reverse.Double

There are three changes.

  1. As discussed in issue #104, the current implementation uses index 0 and value 0.0 for the unused second operand in tape entries corresponding to unary operations. I changed this to use index -1 so that these unused operands can be distinguished from actual operands with index 0 and value 0.0. This is a pure refactoring and should not change any behavior.
  2. Instead of checking if the value of an operand is 0.0 in order to decide whether to skip this operand for backpropagation, it now checks if the index is >= 0. This allows processing tape entries properly in the presence of NaN and Infinity values and makes the tests tests.reverse-double.issue-104.inside pass. However, it no longer skips over entries where the backpropagation really would not have an effect on the partial derivative.
  3. Instead of skipping the iteration if the current partial derivative is 0.0, it now checks if the increments to each of the partial derivatives are 0.0. This allows correct processing of NaN and Infinity values and makes the tests under tests.reverse-double.issue-104.outside pass. It also restores some of the skipping lost in change 2. However, there is still a performance impact compared to the original behavior of skipping the entire update if the corresponding partial derivative is 0.0.

For the bench/BlackScholes.hs benchmark (adjusted for Numeric.AD.Mode.Reverse.Double), I was not able to measure any performance impact of these changes. For two of my own applications, I measured an increase in runtime by a factor of 1.07 and 1.21 respectively.

cartazio commented 1 year ago

Question: why are we not having the check be if v*y != 0 etc?

cartazio commented 1 year ago

The zero handling affects the space complexity of reverse mode. Shouldn’t we keep that and instead have the code do both the product v y != 0 and the -1 trick?

RyanGlScott commented 1 year ago

The CI is stuck due to the GitHub Actions setup using an unsupported Ubuntu version. I've pushed a commit to master (54b1751387d0aa4189e65eaf4eb181600d404c80) which should address this, so rebasing on top of that commit should get the CI un-stuck.

julmb commented 1 year ago

The CI is stuck due to the GitHub Actions setup using an unsupported Ubuntu version. I've pushed a commit to master (54b1751) which should address this, so rebasing on top of that commit should get the CI un-stuck.

Done, thank you! I am confused by what is going on with the CI on ghc-9.0.2 though. I looks like it cannot infer the Typeable constraint on s? Although I am very surprised that this happens only on 9.0.2 and neither on older nor newer versions of GHC.

Question: why are we not having the check be if v*y != 0 etc?

We are. It used to be double x = pTape->val[idx*2]; and then buffer[i] += v*x;. Now we have double x = v * pTape->val[idx*2]; and then buffer[i] += x;, so the multiplication is already part of x and the check if (x != 0) is equivalent to what would have been if (v*x != 0). If we are confident that this common subexpression elimination is done by the compiler anyways, we can also go back to the v*x way of doing things.

The zero handling affects the space complexity of reverse mode. Shouldn’t we keep that and instead have the code do both the product v y != 0 and the -1 trick?

I am not sure what you mean by this. The space complexity should be unchanged, and I have not seen any indication of it being different. The tape is still the same (except for containing -1 instead of 0 in some entries) and the buffer array of partial derivatives is also still the same.

julmb commented 1 year ago

It looks like the failure on ghc-9.0.2 is related to the simplified subsumption change and can be worked around via eta-expansion. Then ghc-9.2.4 introduces the DeepSubsumption language extension that reestablishes backwards compatibility, so it works again in subsequent versions of GHC. I will add the eta-expansion since that seems to be the indended way now.

cartazio commented 1 year ago

i was just not reading the code correctly, thanks for clarifying

julmb commented 1 year ago

Alright, it has been a few weeks, what is the verdict on this? Is more feedback needed? Is more work needed on my part?

It would be nice if some people using the +ffi version of ReverseDouble could test the performance impact on their applications. As far as I am concerned, the performance impact is of course not great, but an acceptable price to pay for actual correctness in the presence of NaN and Infinity values.

RyanGlScott commented 1 year ago

Sorry for forgetting about this—this dropped off my radar.

A handful of observations:

  1. I'm honestly not sure who is actively using the +ffi version of ReverseDouble right now. @sofusmortensen (the creator of +ffi) is an obvious candidate, but I've had trouble reaching @sofusmortensen for comment. I'm not sure if there are others who use it.
  2. Do you have a sense as to why you notice a performance regression in your own applications but not in the BlackScholes benchmark in the ad repo? Would it be worth adding another benchmark that is more sensitive to the changes in this PR as a better way to measure this in the future?
  3. I wonder if it is worth adding a "fast-math" version of ReverseDouble that achieves better performance at the expense of not being correct with respect to special IEEE floating-point values. This would be more work, but it would give a better migration story for anyone whose applications' performance might be negatively impacted by these changes.
cartazio commented 1 year ago

I think this change will not severely impact performance, since branches shouldn't be the bottleneck

cartazio commented 1 year ago

otoh, i could be missing something

julmb commented 1 year ago

Sorry for the delay, the last few weeks have been very busy for me.

  1. Do you have a sense as to why you notice a performance regression in your own applications but not in the BlackScholes benchmark in the ad repo? Would it be worth adding another benchmark that is more sensitive to the changes in this PR as a better way to measure this in the future?

I am not 100% sure, but I believe that this is due to the function in my application having more independent parts. Although the Jacobian in my application does not contain any zeroes (meaning that each parameter can influence every entry in the output vector), there are still subexpressions that arise as part of the calculation that only influence a subset of the entries in the output vector. Thus, the partial derivatives of some output entries with respect to these subexpressions could be zero. Since these zero entries are no longer skipped entirely, the performance is impacted.

My guess is that this occurs to a lesser degree in the BlackScholes benchmark, so the performance impact is smaller.

It might be a good idea to add a more decomposable benchmark, but I currently do not have the time to do this.

  1. I wonder if it is worth adding a "fast-math" version of ReverseDouble that achieves better performance at the expense of not being correct with respect to special IEEE floating-point values. This would be more work, but it would give a better migration story for anyone whose applications' performance might be negatively impacted by these changes.

It could be a good idea, but I will leave that up to you, since it is mainly a matter of library design and maintenance burden considerations.

RyanGlScott commented 1 year ago

Thanks, that information is helpful to know.

Since we don't really have a good way to gauge if there are any other users of this +ffi functionality besides yourself, I'm inclined to prioritize correctness here. And since time is limited, I support landing this without first implementing a fast-math version of ReverseDouble, since it's unclear how much work that would involve.

That being said, it would be nice to track this fast-math idea in an issue so that we can revisit it later if someone comes along later and asks for a more performant ReverseDouble. Can you:

  1. Open an issue about idea (3) from https://github.com/ekmett/ad/pull/105#issuecomment-1660229399, and
  2. Cite that issue in the relevant parts of your PR that check for special floating-point values?

I'd be happy to merge this afterwards. Thanks!

julmb commented 1 year ago
  1. Open an issue about idea (3) from Handle IEEE Floating Point Special Values #105 (comment), and
  2. Cite that issue in the relevant parts of your PR that check for special floating-point values?

Done and done. Once this is merged, I can also reference the relevant code location from #106 so that it is easy to find the starting point for working on this issue.

RyanGlScott commented 1 year ago

Thanks, @julmb! Let me know if you'd like a new ad release with these changes soon. If not, I'll wait until my next round of Hackage releases before getting around to doing so.

julmb commented 1 year ago

Thank you, and I'm not in any rush, no worries.

julmb commented 9 months ago

Thanks, @julmb! Let me know if you'd like a new ad release with these changes soon. If not, I'll wait until my next round of Hackage releases before getting around to doing so.

Alright, it's been a while, and I could actually use a new release with these changes as my test suite fails without them. Would it be possible to do a release on Hackage?

RyanGlScott commented 9 months ago

Sorry for the delay on this! I've uploaded ad-4.5.5 to Hackage with these changes.

julmb commented 9 months ago

No worries, and thank you very much!