Closed qiaoting159753 closed 7 months ago
I think personally you creating your own memory buffer for MBRL stuff is better. I do something similar for Hoda's RDTD3.
It'd solve Henry's concern about the naming. You'd just replace normal sample with your new sample. Because you don't need normal sample in your training loop (just from a quick skim from the gymnasium side)
And so you'd import it like so:
from cares_reinforcement_learning.memory.mbrl import MemoryBuffer
You'd have to update memory_factory for the new buffer also
what do you think
Yea nah... I could do that. That is actually a good idea, for example, we can sample episodes or sample transitions for a certain length to train the network. But I don't think it will be truly worth to do it now. Because MBRL does not need a fundamentally change of the memory, and maybe keep it simple for now? If a new MemoryBuffer needs to inherit this memorybuffer, it can be left as "not implemented".
why not just extend the existing one? and override the sample method?
The current request just duplicates the code entirely - which also isn't ideal
why not just extend the existing one? and override the sample method?
The current request just duplicates the code entirely - which also isn't ideal
why not just extend the existing one? and override the sample method?
The current request just duplicates the code entirely - which also isn't ideal
Right. Extending is a good idea. However, I think the original sample() method is still useful. Data from the original sample() is used to train the agent, like normal training. The newly implemented sample_consecutive() is much slower. It will slow down the program if sample_consecutive() is used for normal training.
I mean extend the MemoryBuffer class - not modify the existing sample function.
yes. I was trying to explain that there is no need to override the sample() function.
We will need to readdress this with pull request #134 - don't merge this until that is done - I will assist with this as my next step
Update to match #140
Add a sample_next function into the memory_buffer.py to train an integrated world model. An integrated world model trains the reward prediction and dynamic (next state) prediction together.
Instead of sample function : (s, a , r, n_s, done) sample_next function : (s, a, r, n_s, done, n_a, n_r).