UoA-CARES / cares_reinforcement_learning

CARES Reinforcement Learning Package
10 stars 2 forks source link

DYNA adapt to the new memory buffer, which has a new sample and sample_consecutive function. #130

Closed qiaoting159753 closed 5 months ago

qiaoting159753 commented 6 months ago

Add a sample_next function into the memory_buffer.py to train an integrated world model. An integrated world model trains the reward prediction and dynamic (next state) prediction together.

Instead of sample function : (s, a , r, n_s, done) sample_next function : (s, a, r, n_s, done, n_a, n_r).

qiaoting159753 commented 6 months ago

I think personally you creating your own memory buffer for MBRL stuff is better. I do something similar for Hoda's RDTD3.

image

It'd solve Henry's concern about the naming. You'd just replace normal sample with your new sample. Because you don't need normal sample in your training loop (just from a quick skim from the gymnasium side)

And so you'd import it like so:

from cares_reinforcement_learning.memory.mbrl import MemoryBuffer

You'd have to update memory_factory for the new buffer also

what do you think

Yea nah... I could do that. That is actually a good idea, for example, we can sample episodes or sample transitions for a certain length to train the network. But I don't think it will be truly worth to do it now. Because MBRL does not need a fundamentally change of the memory, and maybe keep it simple for now? If a new MemoryBuffer needs to inherit this memorybuffer, it can be left as "not implemented".

beardyFace commented 6 months ago

why not just extend the existing one? and override the sample method?

The current request just duplicates the code entirely - which also isn't ideal

qiaoting159753 commented 6 months ago

why not just extend the existing one? and override the sample method?

The current request just duplicates the code entirely - which also isn't ideal

why not just extend the existing one? and override the sample method?

The current request just duplicates the code entirely - which also isn't ideal

Right. Extending is a good idea. However, I think the original sample() method is still useful. Data from the original sample() is used to train the agent, like normal training. The newly implemented sample_consecutive() is much slower. It will slow down the program if sample_consecutive() is used for normal training.

beardyFace commented 6 months ago

I mean extend the MemoryBuffer class - not modify the existing sample function.

qiaoting159753 commented 6 months ago

yes. I was trying to explain that there is no need to override the sample() function.

beardyFace commented 6 months ago

We will need to readdress this with pull request #134 - don't merge this until that is done - I will assist with this as my next step

beardyFace commented 6 months ago

Update to match #140