SciML / StochasticDiffEq.jl

Solvers for stochastic differential equations which connect with the scientific machine learning (SciML) ecosystem
Other
245 stars 66 forks source link

deepcopy prob.noise #496

Closed rmsrosa closed 1 year ago

rmsrosa commented 2 years ago

This fixes #412 .

I noticed deepcopy(prob.noise) is also used at https://github.com/SciML/DiffEqNoiseProcess.jl/blob/master/src/solve.jl#L7

ChrisRackauckas commented 2 years ago

If you do this, then you cannot solve with the same noise. That would be a bug. It's documented behavior that if you choose a noise and you solve with it, you solve with that noise. If you choose a noise and solve with it twice, you get the solution with that noise twice.

rmsrosa commented 2 years ago

1) Currently, solve mutates a noise process Z given in prob = SDEProblem(..., noise = Z). I wouldn't expect that (solve does not have an exclamation point at the end).

2) Even so, it does not generate the same path. Internally, the noise is set to W = prob.noise, which is set to Z, and then W gets emptied before being filled with a new run. Only if the same seed and the same time step are used is that the same noise path is (re)generated. Otherwise, we will get a new noise path W and hence a different Z. Even if we use the same seed but with a different time step, the path won't be the same, because bridge is not used.

3) As I see it currently, the right way to get the same noise path is to use NoiseWrapper, as documented. If we don't use that, I assume we use the same noise process, but not necessarily the same sample path.

Either way, I see a bug to fix. But my opinion is that it is reasonable not to mutate Z given in noise = Z (hence use W = deepcopy(prob.noise) internally) and assume that in this case we use the same noise process but a different sample path. If one wants to use the same sample path, then one should use NoiseWrapper. If, however, you really want the same noise path from noise = Z, then internally one should not empty W and check whether W has more than the starting value and, in that case, use bridge to generate the noise at any new time instant. In that case, I don't know what is the point of NoiseWrapper.

rmsrosa commented 2 years ago

I don't see any error in CI / test (Interface, 1). Maybe it timed out?

As for Interface2 failing in 1 and 1.6, yeah, with deepcopy(prob.noise), reset=false has no effect and https://github.com/SciML/StochasticDiffEq.jl/blob/master/test/scalar_noise.jl#L20 fails, with sol being equal to solve(prob,SRI()).

rmsrosa commented 2 years ago

Ok, this is the best I could think of. I hope deepcopying just integrator.sol.W, instead of the whole integrator or integrator.sol is a reasonable solution. I will leave it like that.

ChrisRackauckas commented 1 year ago

We should add a keyword argument to the interface for alias_noise to turn this off and return to the previous behavior. Also, the noise tutorials which rely on this need to get updated.

It would be good to instead define copy overloads for the NoiseProcess types though, and directly use those. Deepcopy is usually very heavyhanded.

rmsrosa commented 1 year ago

Good idea about the keyword argument alias_noise. I just implemented it, but I still plan to add some tests.

As for overloading copy for the noise types, it is not clear to me yet how to do it properly, because the types may be nested structs.

ChrisRackauckas commented 1 year ago

Then let's go with deepcopy for now, and open an issue in DiffEqNoiseProcess about implementing a copy method.

rmsrosa commented 1 year ago

Oh, I need to add alias_noise to the list of allowedkeywords over at DiffEqBase.jl, right? I'll do that.

ChrisRackauckas commented 1 year ago

Yes

ChrisRackauckas commented 1 year ago

Not sure about that test failure.

rmsrosa commented 1 year ago

The error in the Interface 2 test might have been "bad luck". The test file tau_leaping.jl doesn't reset the seed so the error Evaluated: 729.43435 ≈ 721.9787 (rtol=0.01) was probably just a matter of chance.

As for the errors in Interface 1 (1.0 and 1.6), I believe they have been happening since before I first started to contribute. The error report doesn't say much. In my machine, those tests just hang.

ChrisRackauckas commented 1 year ago

Interface 2 was just bad luck: there are a few random seeds in there that can cause that.

Interface 1, I don't know of that issue from before? Spawn null PR and see if master fails with that. If it does, we need to track that down.

rmsrosa commented 1 year ago

Indeed, it was not there. I probably got confused. Looking at the recent commit history, the only other place I saw the same error was on an innocent Project update commit on Aug 24 and the error was only on Interface 1, not 1.6. Could have been just bad luck too?

ChrisRackauckas commented 1 year ago

This kind of kill seems scary. Retry? One way it could happen is if the CI machine goes OOM. Since the CI machines are shared with many other projects (Github actions just gives you a few cores), another project can cause the machine to OOM, so it can randomly happen (with a higher likelihood the more memory you take).

rmsrosa commented 1 year ago

Yeah, I didn't know the details, but that is what I imagined, that the system would hang for some reason. Retrying is a good idea. And I imagined the codecov decline is due to some tests failing and not covering what it used to?

rmsrosa commented 1 year ago

Do I need to make a silly commit to retrigger it or do you have other means?

ChrisRackauckas commented 1 year ago

Fix something in the README 😅

rmsrosa commented 1 year ago

No broken link in README so I edited some random things 😅

That same error always takes forever on my machine and I just usually kill it. Probably the CI choke at it this time as well. Hopefully it goes thru this time.

rmsrosa commented 1 year ago

Hmm, that is suspicious. I ran master locally and it all went fine. But it is hanging on this branch. Maybe GC is not collecting all the copies in that sequence of solves.

rmsrosa commented 1 year ago

Yup, I used alias_noise=false in the reversal_test.jl solves and the tests were successful. Edit: local tests on my computer, I mean.

ChrisRackauckas commented 1 year ago

Force a GC call somewhere in the tests?

rmsrosa commented 1 year ago

GC did it! Tests passed locally. Let's wait for the CI.

rmsrosa commented 1 year ago

Ok, I had to move the GC inside the inner loop, but that worked, at least.