progress report 04/05 - Githubissues

belledon commented 4 years ago

Better late than never, we finally have BO up and running on milgram

Here is an early snap shot of a stress test I'm running where the particles is fixed to 10 and -1*RMSE is being maximized over observation noise. This will run for 15 steps

http://172.29.189.10:8787/status
|   iter    |  target   | obs_noise |
-------------------------------------
|  1        | -0.6225   |  0.4176   |
|  2        | -0.6747   |  0.7206   |
|  3        | -1.103    |  0.001    |
|  4        | -0.6936   |  1.0      |

belledon commented 4 years ago

heres the full table

|   iter    |  target   | obs_noise |
-------------------------------------
|  1        | -0.6225   |  0.4176   |
|  2        | -0.6747   |  0.7206   |
|  3        | -1.103    |  0.001    |
|  4        | -0.6936   |  1.0      |
|  5        | -0.6111   |  0.5365   |
|  6        | -0.6219   |  0.4858   |
|  7        | -0.616    |  0.2578   |
|  8        | -0.6449   |  0.3201   |
|  9        | -0.6722   |  0.8801   |
|  10       | -0.6128   |  0.1716   |
|  11       | -0.6546   |  0.6142   |
|  12       | -0.5898   |  0.2093   |
|  13       | -0.6229   |  0.1716   |
|  14       | -0.6421   |  0.1608   |
|  15       | -0.67     |  0.8038   |
|  16       | -0.6205   |  0.2272   |
|  17       | -0.6391   |  0.5173   |
|  18       | -0.6215   |  0.5559   |
|  19       | -0.6075   |  0.2776   |
|  20       | -0.6742   |  0.945    |
|  21       | -0.5904   |  0.297    |
|  22       | -0.6536   |  0.6683   |
=====================================

iyildirim commented 4 years ago

It's hard to interpret those RMSE numbers. Can you pick one of these configurations and show model predictions overlaid over model data?

belledon commented 4 years ago

heres the raw output for 300 particles

{"target": -0.6100788150168097, "params": {"obs_noise": 0.41760498269787144}, "datetime": {"datetime": "2020-04-05 15:34:46", "elapsed": 0.0, "delta": 0.0}}
{"target": -0.614434678346776, "params": {"obs_noise": 0.7206041689487159}, "datetime": {"datetime": "2020-04-05 15:54:35", "elapsed": 1189.797266, "delta": 1189.797266}}
{"target": -1.3203670628201614, "params": {"obs_noise": 0.0010000289813242302}, "datetime": {"datetime": "2020-04-05 16:16:43", "elapsed": 2517.526143, "delta": 1327.728877}}
{"target": -0.6365653215933121, "params": {"obs_noise": 1.0}, "datetime": {"datetime": "2020-04-05 16:36:43", "elapsed": 3717.792648, "delta": 1200.266505}}
{"target": -0.607930284300135, "params": {"obs_noise": 0.5554543863374942}, "datetime": {"datetime": "2020-04-05 16:56:34", "elapsed": 4908.463468, "delta": 1190.67082}}
{"target": -0.6245783712392968, "params": {"obs_noise": 0.8717684073541878}, "datetime": {"datetime": "2020-04-05 17:15:28", "elapsed": 6042.31138, "delta": 1133.847912}}
{"target": -0.605194980571396, "params": {"obs_noise": 0.47952428034162115}, "datetime": {"datetime": "2020-04-05 17:35:19", "elapsed": 7233.180005, "delta": 1190.868625}}
{"target": -0.6057789723459392, "params": {"obs_noise": 0.6333615341822991}, "datetime": {"datetime": "2020-04-05 17:54:52", "elapsed": 8406.250552, "delta": 1173.070547}}
{"target": -0.6070542939238719, "params": {"obs_noise": 0.4606221663615384}, "datetime": {"datetime": "2020-04-05 18:14:15", "elapsed": 9569.891565, "delta": 1163.641013}}
{"target": -0.6083493289080839, "params": {"obs_noise": 0.6082832319787164}, "datetime": {"datetime": "2020-04-05 18:33:19", "elapsed": 10713.320539, "delta": 1143.428974}}
{"target": -0.6110714448261707, "params": {"obs_noise": 0.5033792976395024}, "datetime": {"datetime": "2020-04-05 18:52:44", "elapsed": 11878.222969, "delta": 1164.90243}}
{"target": -0.6075299268288896, "params": {"obs_noise": 0.657849079560393}, "datetime": {"datetime": "2020-04-05 19:11:06", "elapsed": 12980.202836, "delta": 1101.979867}}

iyildirim commented 4 years ago

Looks plausible! And we'd definitely like to see 4 particles (or a joint search over both particle count say with 3 or 4 discrete options and noise).

iyildirim commented 4 years ago

In the 10 particle simulations, are we averaging 10 or 20 chains per trial? The 300-particles simulations look more converged.

belledon commented 4 years ago

here is optimizing both .. i think i need to seed it better (noise around 0.5 ,particles around 20)

{"target": -0.6169372479239013, "params": {"obs_noise": 0.41760498269787144, "particles": 72.31212485077366}, "datetime": {"datetime": "2020-04-05 21:10:29", "elapsed": 0.0, "delta": 0.0}}
{"target": -1.3162669957684645, "params": {"obs_noise": 0.0011142604425275417, "particles": 30.930924690552136}, "datetime": {"datetime": "2020-04-05 21:12:55", "elapsed": 145.654915, "delta": 145.654915}}
{"target": -0.9908462935961524, "params": {"obs_noise": 0.016101987537883444, "particles": 70.33227546615922}, "datetime": {"datetime": "2020-04-05 21:18:21", "elapsed": 471.342621, "delta": 325.687706}}
{"target": -0.6377155090320301, "params": {"obs_noise": 1.0, "particles": 82.96245235283767}, "datetime": {"datetime": "2020-04-05 21:24:23", "elapsed": 833.302425, "delta": 361.959804}}
{"target": -0.9144579925820026, "params": {"obs_noise": 1.0, "particles": 1.0}, "datetime": {"datetime": "2020-04-05 21:24:54", "elapsed": 864.935219, "delta": 31.632794}}
{"target": -1.338467912166593, "params": {"obs_noise": 0.001, "particles": 100.0}, "datetime": {"datetime": "2020-04-05 21:32:36", "elapsed": 1326.683523, "delta": 461.748304}}
{"target": -0.6440363048925146, "params": {"obs_noise": 1.0, "particles": 48.67484654478152}, "datetime": {"datetime": "2020-04-05 21:36:21", "elapsed": 1551.542431, "delta": 224.858908}}
{"target": -1.1266144667190328, "params": {"obs_noise": 0.001, "particles": 14.714457195982508}, "datetime": {"datetime": "2020-04-05 21:37:43", "elapsed": 1633.36214, "delta": 81.819709}}
{"target": -0.637011559661109, "params": {"obs_noise": 1.0, "particles": 57.214600885251464}, "datetime": {"datetime": "2020-04-05 21:42:01", "elapsed": 1891.392657, "delta": 258.030517}}
{"target": -1.244198701298148, "params": {"obs_noise": 0.001, "particles": 40.593578077274046}, "datetime": {"datetime": "2020-04-05 21:45:03", "elapsed": 2074.260245, "delta": 182.867588}}
{"target": -1.298698235965499, "params": {"obs_noise": 0.001, "particles": 77.76720733966715}, "datetime": {"datetime": "2020-04-05 21:50:35", "elapsed": 2405.418573, "delta": 331.158328}}
{"target": -0.6379326712081878, "params": {"obs_noise": 0.9920512728773127, "particles": 90.12590362081329}, "datetime": {"datetime": "2020-04-05 21:57:01", "elapsed": 2792.118834, "delta": 386.700261}}

belledon commented 4 years ago

In the 10 particle simulations, are we averaging 10 or 20 chains per trial? The 300-particles simulations look more converged.

ah i new i forgot something, no there are no chains here... i'm not sure I can get this setup tonight but it shouldnt make that much of a differnece since we are averaging for each trial and running several iterations

iyildirim commented 4 years ago

Sounds good. I'm not sure if we should treat particles as continuous. It may be better to have a few preset values, say 4 of them: 1, 4, 10, 50.

In BO, you can have a continuous parameter (0 to 1) but then quantize it to these 4 values; e.g., 0 to 0.25 is 1 particle, 0.25 to 0.5 4 particles, 0.5 to 0.75 10 particles, and 0.75 to 1 50 particles.

iyildirim commented 4 years ago

Actually at this point I wonder if a grid search would do better with 4 (+1 for IO) particle levels and say 10 or 20 noise levels. I worry in refining BO we will end up running this many if not more configurations of the model anyhow.

I think you'd like to run multiple chains per stimuli (10 or 20). Otherwise, initial linear regression will fit to a lot of noise and the predictions on the congruent/incongruent pairs will be quite noisy as well. It'll be asymmetrically more pronounced for smaller particle counts.

But before debugging BO any further or switching to grid search, I just would like to see how model predictions look like in this RMSE regime with a pair of plausible choices over parameters (300 particles and 0.5 obs noise for IO and 4 particles and 0.2 or 0.1 obs noise for efficient inference). Did you check to see if RMSE is telling enough between these two (or similar thereof) versions of the models? It'd be helpful to see summary RMSE plots (normal, congruent, incongruent) and per trial overlaid model/behavior plots.

belledon commented 4 years ago

I worry in refining BO we will end up running this many if not more configurations of the model anyhow.

My understanding was BO was chosen to have some objective optimality in model fitting, not efficiency.

I think you'd like to run multiple chains per stimuli (10 or 20). Otherwise, initial linear regression will fit to a lot of noise and the predictions on the congruent/incongruent pairs will be quite noisy as well. It'll be asymmetrically more pronounced for smaller particle counts.

This is a fair statistical argument. Although there is an epistemic counterpoint. How can some models account for average human performance with a single chain while others require multiple chains with the only difference being the number of particles. This opens up an alternative series of analysis that I'd like to discuss at some point.

iyildirim commented 4 years ago

You have 20 subjects (or something like that) responding per trial... You are looking average of those responses. We would like to model each subject as a chain, which is to simply simulate the model that many times on each trial and average its responses.

The only reason we'd like BO is efficiency and nothing else. Often grid search is infeasible. Where grid search is feasible, it can be treated as gold standard.

belledon commented 4 years ago

We would like to model each subject as a chain, which is to simply simulate that the model that many times on each trial and average its responses.

I understand all of that but that is not an argument of why 1 chain = 1 subject. One argument is that the true posterior we have designed describes a wide range of possible mass ratios at various points in time but we assume that subjects have relatively "tight" beliefs of mass ratio at any given moment. Thus a single chain, which would also presumably be "tight", serves as a model for individual responses that we can average over..

However that seems more of the tool dictating the theory kind of scenario. We could model this behavior explicitly in a posterior by having each chain explicitly sample a "tight" belief

Where grid search is feasible, it can be treated as gold standard.

This is confusing to me. The main problem with grid search is the scale of the metric for interpolation. While I'm not saying we should not do grid search at all, it is an empirically verifiable concern that whatever grids we chose are either at too large or small a range to capture the main interaction.

iyildirim commented 4 years ago

Let's talk on the zoom -- either you have to adapt the posterior in the way you are talking about (which I don't understand) or you have to run multiple chains each chain simulation a subject. Just imagine basing all of your analysis on a single subject... You are assuming each subject has very little variance in their belief distribution -- depending on how you quantify variance that's something you'll get with fewer particles.

That's why grid search can become infeasible. Do we care about significance beyond the first digit in our noise estimate? Like is 0.12 different from 0.18? Maybe. In that case, you'd need to divide the range of 0-1 to 100 intervals.

CNCLgithub / GalileoEvents

progress report 04/05 #14