Using TD3 as an exmaple, if the the noise_type is not specified for a custom environment in td3.yml. The following weird behavior happens:
The logic of deciding n_actions would be skipped and n_actions would remain None (in exp_manager.py). The value None will be further passed down to the Noise constructor, e.g: NormalActionNoise(mean=np.zeros(trial.n_actions), sigma=noise_std * np.ones(trial.n_actions))
Depending on the value of n_envs, the program would raise an error, or produce unintended result silently.
When n_envs > 1, an error would be raised for matrix shape mismatch.
When n_envs = 1, n_actions=None. No runtime error raised. Instead, the action noise will be one dim and broadcasted to the actual environment dimension, which will likely degrade the model performance silently (one of the most frustrating issues in ML).
The unintended behvaiour also depends on the actual environment action space, but you get the idea..
==========================
I think people expect that when a default param is not specified in td3.yml but present in the params sampler (e.g. sample_td3_params() in hyperparams_opt.py), the program will just use a sampled value and work as intended.
🐛 Bug
Using TD3 as an exmaple, if the the
noise_type
is not specified for a custom environment in td3.yml. The following weird behavior happens:The logic of deciding
n_actions
would be skipped andn_actions
would remainNone
(in exp_manager.py). The value None will be further passed down to the Noise constructor, e.g:NormalActionNoise(mean=np.zeros(trial.n_actions), sigma=noise_std * np.ones(trial.n_actions))
Depending on the value of
n_envs
, the program would raise an error, or produce unintended result silently.n_envs > 1
, an error would be raised for matrix shape mismatch.n_envs = 1
,n_actions=None
. No runtime error raised. Instead, the action noise will be one dim and broadcasted to the actual environment dimension, which will likely degrade the model performance silently (one of the most frustrating issues in ML).The unintended behvaiour also depends on the actual environment action space, but you get the idea..
==========================
I think people expect that when a default param is not specified in td3.yml but present in the params sampler (e.g. sample_td3_params() in hyperparams_opt.py), the program will just use a sampled value and work as intended.
To Reproduce
Relevant log output / Error message
No response
System Info
No response
Checklist