CNCLgithub / GalileoEvents

Galileo + Events
MIT License
2 stars 0 forks source link

exp1_pf super slow #7

Closed belledon closed 4 years ago

belledon commented 4 years ago

i'm current playing with PackageCompiler to see if we can get the 23ms performance https://julialang.github.io/PackageCompiler.jl/dev/examples/plots/

Originally posted by @belledon in https://github.com/CNCLgithub/galileo-ramp/issues/3#issuecomment-601731436

belledon commented 4 years ago

using PackageCompiler i was able to reduce the pf to ~30s which is still way too slow.

unfortunately ProfileView did not give me the resolution to figure much out other than the majority of time is spend during rejuvination.

After poking around I found that the issue is not the forward function but rejuvination itself.. The forward funciton is actually performing as fast as it normally should, ~4-10ms. However each rejuvination step seems to increase linearly from 500ms -> 1500ms.

belledon commented 4 years ago

@iyildirim

ok so after more poking it seems to be the gm as a whole... A full execution of 120 frames leads to a runtime of 1600ms which explains the delay somewhat..

i think we finally (or at least i did) realized what marco was so confused about when we first explained the inference procedure to him in galileo ccn.

We need to make a few changes to the model. mainly what happens is that everytime the belief of density is updated, the entire physical trace is revised (makes sense). this means that:

  1. inference will be super expensive (increases quasi quadratically)
  2. there may be an epistemic discrepancy between what our gm describes and what we would like to model

I propose a new posterior that explicitly tracks the belief of the physical properties changing across time.

To estimate this we have to make a subtle change to the gm. Rather than simply sampling a scalar density value from the prior, you instead sample the base distribution of density (ie some gaussian with mean mu and variance theta). then this density belief rv is sampled for each time step, where the sampled scalar is used in simulation for that time step.

thus rejuvination updates these time-stamped density values with a markov-chain update schema

belledon commented 4 years ago

this is something we can discuss later, we can actually run IO without rejuvinaiton, the posterior here should be simple enough

iyildirim commented 4 years ago

Feel free to implement what you are proposing...

Is it slow without rejuvination as well? Or at least, is rejuvination contributing to the slowness in the expected manner? Each rejuvination step should equal one particle filter update step w/o rejuv in run time.

How are you determining whether to do rejuvination? Is this just one step of Gaussian random walk MH after each update?

If so, why wasn't it slow before?

And yes, let's run the IO (either as an MCMC taking the entire video as a batch) or SMC with thousands of particles. We would like to see whether the model estimated mass ratios correlate with the ground truth mass ratios.

belledon commented 4 years ago

it was always this slow (ie 20 particles would take 20min), I was trying to figure out why.. i'm mainly trying to optimize speed for BO although a distributed scheduler might be able to absorb the cost.

and yes rejuvination is the culprit since it increases the number of simulation steps in polynomial amounts. running without rejuvination it takes roughly 1.3s X N where N is the number of particles

belledon commented 4 years ago

so in theory, the new gm should be a linear scale of that number

belledon commented 4 years ago

i might try to optimize the way we call pybullet since it should be operating in the sub milisecond scale anyways

belledon commented 4 years ago

@eivinasbutkus, i could use help on these last few things

iyildirim commented 4 years ago

Sounds good, thanks.

eivinasbutkus commented 4 years ago

@belledon I'll work on distributed BO - I think that's where I'll be of most use?

iyildirim commented 4 years ago

In the meantime, can we have model predictions for a low-particle count implementation (say n=4) and a high particle count implementatoin (IO) using the sensory noise parameter we had from before?