Changes to RockSample to add more options and match original problem

JuliaPOMDP / RockSample.jl

Implement the rock sample problem using POMDPs.jl

Other

5 stars 5 forks source link

Changes to RockSample to add more options and match original problem #19

Closed dylan-asmar closed 2 years ago

dylan-asmar commented 2 years ago

Modified the problem to enable more options. Added a step penalty and a sensor usage penalty. The default values are set to 0.
Changed the default sensor_efficienty to 20.0 (was at 10.0). This change aligns with the original (5,7), (7,8), and (10,10) problems documented here.
Modified the calculation of sensor efficiency. Similar reference as the previous change. Matched the original (5,7), (7,8), and (10,10) implementations.

zsunberg commented 2 years ago

@dylan-asmar thanks. should the new reward options go in the README? (also feel free to suggest a change in documentation)

Also, it would probably be good practice to add some small tests for the new reward functionality, but I don't think this is a strict requirement.

@MaximeBouton any comments?

dylan-asmar commented 2 years ago

Good call. I will make changes to the README and update this request. I'll also work to add a few tests.

dylan-asmar commented 2 years ago

Updated the README and added tests for the reward.

Also added the action selection to the bottom of the visualization and the ability to see the belief that each rock is "good" (depicted by a full green bar). The unique nature of RockSample allows us to visualize the belief state of the POMDP by observing the belief over each rock. This option can be turned off, but is defaulted to on as I thought it provides insight how the belief can change after actions.

zsunberg commented 2 years ago

@dylan-asmar Thanks! this looks good! Belief rendering is especially useful!

There is one question to consider: Is it better to render b (i.e. belief before the action and observation) or b' (belief after the action and observation)? In LaserTag, for instance, it is more intuitive to render b' because you can see the result of the lasers eliminating possibilities of where the target might be (see https://github.com/JuliaPOMDP/POMDPGallery.jl#lasertag). Can you quickly confirm that you have thought through this? I will merge once you confirm.

In the future, it would be helpful to put the reward updates and visualization updates in separate pull requests to make it easier to review.

Thanks so much for your work on this!

dylan-asmar commented 2 years ago

@zsunberg I think there could be arguments both ways on rendering b vs b'. That being said, with the action description at the bottom, I think it makes more sense to show the current belief and the action selected based on that belief. This line of thinking might be biased based on what I have been working on, but seemed to make the most sense in my mind. I also included a prefix option in the action name section that appears at the bottom for this reason as well. When I was rendering visualizations, I did it before the action/observation to show what the selected action was based on the belief and then after the action/observation with the transition (s', b').

Sorry about grouping them together. I was thinking the same thing after, but the damage was done at that point.

zsunberg commented 2 years ago

OK, it's merged.

Now we need to decide what new version we should release, 0.1.5 or 0.2.0. The key question to ask is "will these changes break anyone's code that uses this". If you have only added features with no breaking changes, then we should release 0.1.5. If there are any breaking changes, however small, we should release 0.2.0. If the changes to the reward function change the average return for a policy when a default rock sample pomdp is constructed, I think we should consider that breaking.

What do you think?