ciemss / pyciemss

Causal and probabilistic reasoning with continuous time dynamical systems
Other
17 stars 6 forks source link

Output ELBO loss during calibrate #166

Closed djinnome closed 8 months ago

djinnome commented 1 year ago

In PR https://github.com/ciemss/pyciemss/pull/163 we discussed that it would be helpful for the user to view the convergence of the SVI algorithm.

Currently, calibrate prints out the loss every 25 iterations. https://github.com/ciemss/pyciemss/blob/5da3f8347d47a7871b7d0b18d766108bb6565fb1/src/pyciemss/PetriNetODE/interfaces.py#L107-L111

Joshua would like to capture this information either as an actual (optional) output, or as logging output so that it could be visualized as it is happening.

good convergence

image

bad convergence

image

WTF convergence

image

djinnome commented 1 year ago

Hi Joseph, I assigned this to you because you may have opinions on whether it makes more sense to send the ELBO output to a logfile (for potential early stopping) or as an optional output of the calibrate function

JosephCottam commented 1 year ago

This is case where I think we should do both:

From an implementation standpoint, if we are doing both, I woudl go about it like this:

  1. Create a file dedicated to the convergence data.
  2. Make that file name available some how so it can be monitored. (A RESTful way to do this would be to return a work ID as part of the response and have another query for "what is the log file associated with this work id". Its an extra round-trip, but it makes it easy to get at the file from more than one client.)
  3. Periodically write to the file.
  4. When work is done, return the file ID as part of the result.

This setup does require some persistent storage so that files outlive jobs, but it is usually simple to setup: add a file handle to existing print statements, thread that file handle through the calls.

(The convergence plots don't show for me...broken images. I think I remember the images from notebooks, though.)

SamWitty commented 8 months ago

Addressed by progress_hook in calibrate.