Closed thebot002 closed 3 months ago
Additional tasks created: #39 #38
The problem is all working with a n by n grid of cells. The FSVI agent learns a sensible policy but unfortunately it doesn't perform well yet.
The performance observed so far is 10 to 20 percent of convergence on the artificial data. With the actual simulation data it does not perform any better obviously.
The QMDP agent has also been test with this to no greater success.
And the Infotaxis agent is currently being investigated but some things need fixing.
I will keep on investigating how to improve the results but I suspect the agent being unaware of it's own position is the main issue at hand and that we will need to investigate the Position aware implementation to improve things. (#38 #39 )
I fixed the Infotaxis agent problem but it didn't help, the performance is still bad on the artificial data...
Something to investigate is with using the FSVI agent but training it, instead of on a uniform distribution belief as starting point, using a realistic starting belief point that is uniform only across the states of a single agent-position.
This last version of the training, with more accurate training point also lead to a mere 10% convergence:
Simulations reached goal: 573/5561 (4988 failures (reached horizon: 4988)) (10.30% success)
Grid of success/failures:
Sample successful trajectory:
Sample failed trajectory:
Also, we note that in the failed strategy, at the end, the agent only does a back and forth in the x axis. (from observing the detail log).
This leads me to believe the agent is too confused about it's own position in space to be able to find the source accurately.
After a talk with Agnese we said the best way to have fast results is to add the Source position as a layer of positions (within the Minimal model). And with this, keeping the transitions fuzzy as the transition probabilities (on the agent's position level) need to be stochastic due to the "tile" concept.
A separate Task item will be created to add a "Position Aware" version of the problem be defined where the physical position of the agent is sent along with the observation in the update_state function of the agent.