torchsde vs DiffEqFlux.jl

AdrienCorenflos commented 4 years ago

Hi,

First of all, thanks a lot for the work, this is excellent research!

I am aware of a Julia library that does roughly the same thing (https://github.com/SciML/DiffEqFlux.jl). Could you please describe what the differences are in your opinion (either algorithmic differences or interface ones)? So far the only one I can see relates to the fact that it seems easier to request a batch of trajectories in torchsde than it is in DiffEqFlux (might be wrong though).

Thanks a lot

Adrien

duvenaud commented 4 years ago

Thanks for asking. There are a few algorithmic advances introduced in our recent paper Scalable Gradients for Stochastic Differential Equations that this repo is meant to demonstrate:

Strongly reconstructing SDE sample paths by running them backwards from the state and using identical Brownian motion.
Storing an entire Brownian motion sample in O(1) memory using a virtual Brownian tree.
Combining the adjoint method with variational inference in a way that is efficient for diagonal noise.

The third point I think is under-rated - I don't think that simply fitting an SDE directly to minimize squared error or match finite moments is consistent in most settings. Specifically, if you optimize squared error and want to learn the noise, I'm pretty sure that the noise will always shrink as far as you allow it, which partly defeats the purpose of fitting SDEs to data.

In contrast, fitting all SDE parameters using maximum marginal likelihood should give you the benefits of Bayesian Occam's razor, i.e. it will choose the longer prior lengthscales possible that still fit the data.

I don't know the details of the state of DiffEqFlux.jl. I think they've added 1 but I think they don't yet have 2 or 3 implemented.
As for solvers, we certainly have fewer than DiffEqFlux, and they're not as full-featured. In particular, ours don't have any special handling for stiff systems.

lxuechen commented 4 years ago

Thanks for asking this question! I think David's comments address the main points.

Adding to this conversation, the last time I checked out DiffEqFlux.jl, it seems there had been some progress on the adjoints in simplified settings, see e.g. here. On the other hand, I am aware that there are still multiple open issues on DiffEqFlux.jl about the adjoint implementation, see e.g. issue1, issue2, issue3. So I'm guessing that there still is working to be done.

We're happy to discuss about the details of algorithms/implementation, but for now I'm closing this issue. Feel free to reopen in the future if anything pops up.

google-research / torchsde

torchsde vs DiffEqFlux.jl #2