abhisheknaik96 / differential-value-iteration

Experiments in creating the ultimate average-reward planning algorithm
Apache License 2.0
0 stars 2 forks source link

added a flag to save final estimates #35

Closed abhisheknaik96 closed 3 years ago

abhisheknaik96 commented 3 years ago

To get started with #34, I have added a flag to save the learned estimates to an npy file. This will let us compare the learned estimates to some golden values stored elsewhere.

This is a sample implementation — there are several things that can be done differently (e.g., the accessibility of the name of the file being stored; the save location can be a command-line argument; we can store the experiment results in the file as well). Hence, I've only implemented the get_estimates function for MDVI right now (that's why the tests for DVI and RVI are failing). We can discuss and converge to a better alternative implementation.

btanner commented 3 years ago

I think this breaks our tests b/c of all the algorithms that don't have get_estimates() implemented. Could you either:

  1. add it to all of the algorithms
  2. remove the @abc.abstractmethod from the parent class and return an empty dict or something in that version

Otherwise, I think this looks good. I have a few thoughts for extending this: 1) Saving estimates at interval so we can see how they change over time in post-hoc analysis 2) Having an easy way to match hyper params to results to easily pick "Best runs", etc.

But that's all for future.

abhisheknaik96 commented 3 years ago

I've added a sample implementation of get_estimates() to dvi and rvi so that the tests pass.

And yes, we should do both of the things you mentioned in the near future!