Closed julmb closed 1 year ago
Question: why are we not having the check be if v*y != 0 etc?
The zero handling affects the space complexity of reverse mode. Shouldn’t we keep that and instead have the code do both the product v y != 0 and the -1 trick?
The CI is stuck due to the GitHub Actions setup using an unsupported Ubuntu version. I've pushed a commit to master
(54b1751387d0aa4189e65eaf4eb181600d404c80) which should address this, so rebasing on top of that commit should get the CI un-stuck.
The CI is stuck due to the GitHub Actions setup using an unsupported Ubuntu version. I've pushed a commit to
master
(54b1751) which should address this, so rebasing on top of that commit should get the CI un-stuck.
Done, thank you! I am confused by what is going on with the CI on ghc-9.0.2 though. I looks like it cannot infer the Typeable
constraint on s
? Although I am very surprised that this happens only on 9.0.2 and neither on older nor newer versions of GHC.
Question: why are we not having the check be if v*y != 0 etc?
We are. It used to be double x = pTape->val[idx*2];
and then buffer[i] += v*x;
. Now we have double x = v * pTape->val[idx*2];
and then buffer[i] += x;
, so the multiplication is already part of x
and the check if (x != 0)
is equivalent to what would have been if (v*x != 0)
. If we are confident that this common subexpression elimination is done by the compiler anyways, we can also go back to the v*x
way of doing things.
The zero handling affects the space complexity of reverse mode. Shouldn’t we keep that and instead have the code do both the product v y != 0 and the -1 trick?
I am not sure what you mean by this. The space complexity should be unchanged, and I have not seen any indication of it being different. The tape is still the same (except for containing -1
instead of 0
in some entries) and the buffer
array of partial derivatives is also still the same.
It looks like the failure on ghc-9.0.2 is related to the simplified subsumption change and can be worked around via eta-expansion. Then ghc-9.2.4 introduces the DeepSubsumption
language extension that reestablishes backwards compatibility, so it works again in subsequent versions of GHC. I will add the eta-expansion since that seems to be the indended way now.
i was just not reading the code correctly, thanks for clarifying
Alright, it has been a few weeks, what is the verdict on this? Is more feedback needed? Is more work needed on my part?
It would be nice if some people using the +ffi
version of ReverseDouble
could test the performance impact on their applications. As far as I am concerned, the performance impact is of course not great, but an acceptable price to pay for actual correctness in the presence of NaN
and Infinity
values.
Sorry for forgetting about this—this dropped off my radar.
A handful of observations:
+ffi
version of ReverseDouble
right now. @sofusmortensen (the creator of +ffi
) is an obvious candidate, but I've had trouble reaching @sofusmortensen for comment. I'm not sure if there are others who use it.BlackScholes
benchmark in the ad
repo? Would it be worth adding another benchmark that is more sensitive to the changes in this PR as a better way to measure this in the future?ReverseDouble
that achieves better performance at the expense of not being correct with respect to special IEEE floating-point values. This would be more work, but it would give a better migration story for anyone whose applications' performance might be negatively impacted by these changes.I think this change will not severely impact performance, since branches shouldn't be the bottleneck
otoh, i could be missing something
Sorry for the delay, the last few weeks have been very busy for me.
- Do you have a sense as to why you notice a performance regression in your own applications but not in the
BlackScholes
benchmark in thead
repo? Would it be worth adding another benchmark that is more sensitive to the changes in this PR as a better way to measure this in the future?
I am not 100% sure, but I believe that this is due to the function in my application having more independent parts. Although the Jacobian in my application does not contain any zeroes (meaning that each parameter can influence every entry in the output vector), there are still subexpressions that arise as part of the calculation that only influence a subset of the entries in the output vector. Thus, the partial derivatives of some output entries with respect to these subexpressions could be zero. Since these zero entries are no longer skipped entirely, the performance is impacted.
My guess is that this occurs to a lesser degree in the BlackScholes
benchmark, so the performance impact is smaller.
It might be a good idea to add a more decomposable benchmark, but I currently do not have the time to do this.
- I wonder if it is worth adding a "fast-math" version of
ReverseDouble
that achieves better performance at the expense of not being correct with respect to special IEEE floating-point values. This would be more work, but it would give a better migration story for anyone whose applications' performance might be negatively impacted by these changes.
It could be a good idea, but I will leave that up to you, since it is mainly a matter of library design and maintenance burden considerations.
Thanks, that information is helpful to know.
Since we don't really have a good way to gauge if there are any other users of this +ffi
functionality besides yourself, I'm inclined to prioritize correctness here. And since time is limited, I support landing this without first implementing a fast-math version of ReverseDouble
, since it's unclear how much work that would involve.
That being said, it would be nice to track this fast-math idea in an issue so that we can revisit it later if someone comes along later and asks for a more performant ReverseDouble
. Can you:
I'd be happy to merge this afterwards. Thanks!
- Open an issue about idea (3) from Handle IEEE Floating Point Special Values #105 (comment), and
- Cite that issue in the relevant parts of your PR that check for special floating-point values?
Done and done. Once this is merged, I can also reference the relevant code location from #106 so that it is easy to find the starting point for working on this issue.
Thanks, @julmb! Let me know if you'd like a new ad
release with these changes soon. If not, I'll wait until my next round of Hackage releases before getting around to doing so.
Thank you, and I'm not in any rush, no worries.
Thanks, @julmb! Let me know if you'd like a new
ad
release with these changes soon. If not, I'll wait until my next round of Hackage releases before getting around to doing so.
Alright, it's been a while, and I could actually use a new release with these changes as my test suite fails without them. Would it be possible to do a release on Hackage?
Sorry for the delay on this! I've uploaded ad-4.5.5
to Hackage with these changes.
No worries, and thank you very much!
This PR adds several changes primarily aimed at addressing issue #104.
Expand Test Suite
I added tests for the combinators
diff
,grad
,jacobian
, andhessian
. These include both basic correctness tests as well as specific regression tests for issue #104. Some of the tests were taken directly from the package documentation. For instance, the moduleNumeric.AD.Mode.Reverse
lists several examples for its combinators.The test suite is currently instantiated for the modes
Numeric.AD.Mode.Reverse
andNumeric.AD.Mode.Reverse.Double
but could easily be extended to cover the other modes. However, there are failing tests in some of the other modes and I am not sure if adjusting those should be part of this pull request. I am also not sure if/when I will have time to look at these, so the test suite is restricted toNumeric.AD.Mode.Reverse
andNumeric.AD.Mode.Reverse.Double
for now.It is worth noting that with the current setup, it is not possible to test both the pure and the
+ffi
version ofNumeric.AD.Mode.Reverse.Double
at the same time, since they are implemented within the same module and activated via preprocessor switch. This is especially troublesome since the trickier+ffi
version is not active by default. It would be nice if the+ffi
version could be its own module, but I do not know the rationale behind the original preprocessor switch design.Adjust
+ffi
Version ofNumeric.AD.Mode.Reverse.Double
There are three changes.
0
and value0.0
for the unused second operand in tape entries corresponding to unary operations. I changed this to use index-1
so that these unused operands can be distinguished from actual operands with index0
and value0.0
. This is a pure refactoring and should not change any behavior.0.0
in order to decide whether to skip this operand for backpropagation, it now checks if the index is>= 0
. This allows processing tape entries properly in the presence ofNaN
andInfinity
values and makes the teststests.reverse-double.issue-104.inside
pass. However, it no longer skips over entries where the backpropagation really would not have an effect on the partial derivative.0.0
, it now checks if the increments to each of the partial derivatives are0.0
. This allows correct processing ofNaN
andInfinity
values and makes the tests undertests.reverse-double.issue-104.outside
pass. It also restores some of the skipping lost in change 2. However, there is still a performance impact compared to the original behavior of skipping the entire update if the corresponding partial derivative is0.0
.For the
bench/BlackScholes.hs
benchmark (adjusted forNumeric.AD.Mode.Reverse.Double
), I was not able to measure any performance impact of these changes. For two of my own applications, I measured an increase in runtime by a factor of 1.07 and 1.21 respectively.