StepNeverStop / RLs

Reinforcement Learning Algorithms Based on PyTorch
https://stepneverstop.github.io
Apache License 2.0
449 stars 93 forks source link

error when checking the length of shape tf 2.0 #10

Closed kmakeev closed 4 years ago

kmakeev commented 5 years ago

tf.version '2.0.0' tfp.version '0.8.0'

params --gym -a sac_no_v -n train_using_gym -g --gym-env CarRacing-v0 --render-episode 10 --gym-agents 4

ER in converted code: relative to C:\Python34\RLs\Nn:

tf2nn.py:144 call  *
    features = self.share(super().call(vector_input, visual_input))
tf2nn.py:86 call  *
    features = self.conv1(visual_input)

AttributeError: 'actor_continuous' object has no attribute 'conv1'

in tf2nn.py class ImageNet(tf.keras.Model): init() len(visual_dim) is '4' and conv1 layers are not added to the model, etc. since 'if len(visual_dim) == 5:' in 'def call(self, vector_input, visual_input):' shape is (None, 1, 96, 96, 3), him len is '5' and here we get an error if visual_input is None or len(visual_input.shape) != 5: pass else: features = self.conv1(visual_input)

kmakeev commented 5 years ago

please, check also in gym_loop.py def init_variables(env, action_type): """ inputs: env: Environment action_type: discrete or continuous outputs: i: specify which item of state should be modified mu: action bias sigma: action scale state: [vector_obs, visual_obs] newstate: [vector_obs, visual_obs] """ i = 1 if len(env.observation_space.shape) == 3 else 0 mu, sigma = get_action_normalize_factor(env.action_space, action_type) return i, mu, sigma, [np.empty(env.n), np.array([[]] env.n)], [np.empty(env.n), np.array([[]] env.n)]

returns 'state' witch shape <class 'tuple'>: (4, 1, 210, 160, 3) I think it should be (4, 210, 160, 3)

StepNeverStop commented 5 years ago

Hi, The problem you found is really a serious BUG, I have fixed it basicly in the lastest commit. Now it work for Adventure-v0, Berzerk-v0, etc... which has visual observation, but still not work for CarRacing-v0. Don't know why, maybe this environment doesn't support multi-threading very well.

You can try other visual based env for training, I will continue to test CarRacing-v0 to see whether it could work or not.

And, due to the need to be compatible with the Unity environment(It may have multiple image input sources). So I have to use Conv3D instead of Conv2D, that's why I updimension the visual observation for gym env. I am planing to writer another pure-gym env training project, so that will look less redundant.

thx.

StepNeverStop commented 5 years ago

you could try to training atari games.

kmakeev commented 5 years ago

Thank! I will try.

kmakeev commented 5 years ago

Sorry! Breakout-v0 still doesn't work. Even worse. After launch, there is a lot of memory consumption and the application crashes. I see the change in 'atary_loop.py' def init_variables (env, action_type): ... did not affect the shape of the returned values.

Let me draw your attention to the following things:

  1. It is better to store a state map for models working with pictures in Int8, (these are colors in values up to 255) This greatly saves memory. As a sample, you can see this project https://github.com/fg91/Deep-Q-Learning

  2. For class model ImageNet (tf.keras.Model): I have not seen if there is a normalization of the input data (/ 255)

  3. Many Atari games (including Breakout-v0) do not start without receiving the fire action. And their training will not be successful.

StepNeverStop commented 5 years ago

@kasimte Hi,

python run.py --gym --gym-env Breakout-v0 -a dqn --render-episode 0 work for me.

There is lots of memory consumption because of:

  1. float32, not int8. this will be optimize later, and normalization will also be implemented later, either for vector input and image input.

  2. conv3d, not con2d. conv3D has more variables, so its optimization will have more time consuming. you know, it's really hard to be compatible with Unity ML-agens, so I will write another function to deal with Gym visual input later.

image

you said the application has crashed, I don't know whether it just breaked and shut down or just got stuck. If the application get stuck, that's normal, because off-policy algorithms will be learned a lot of times, like dqn, dueling dqn etc... I mean, if the length of episode is 100, then I will call train function 100 times after the episode is end. Or maybe you should decrease the batch_size for visual input training.

Because my computer hardware is not very good, so the optimization of image input is not very good. And I didn't pay much attention to that part, very sorry about that. I will keep optimizing those parts.

And welcome your PR.

Thx.

kmakeev commented 5 years ago

OK. I'm not in a hurry with the result. Ready to join the project, but it takes time to understand the code.

My problem is clearly out of memory:

no op step 10000 WARNING: tensorflow: Layer actor_net is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call tf.keras.backend.set_floatx ('float64'). To change just this layer, pass dtype = 'float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast = False to the base Layer constructor.

Process finished with exit code -1073740791 (0xC0000409)

StepNeverStop commented 5 years ago

maybe 1w experience is too large for you memory.... don't worry, I will optimize those issues later.

kmakeev commented 5 years ago

Ok! Thanks. One more question. Does the current code allow for a Exploration-exploitation trade-off? (I have not found) If not, is it planned? This is the first thing I could do ...

StepNeverStop commented 5 years ago

Yes, for a lot of algorithms you could adjust epsilon. For sac, you could change the target entropy or alpha's initial value, and log_std_bound.

For all of these algorithms which implement based on tf2.0, you could change the layer from Dense to Noisy in tf2nn.py to implement noisy net for more exploration.