Closed Zero1366166516 closed 2 years ago
As explained in the issue template and 3 times in #982, we can't help you if you don't provide us with a well formatted minimal code to reproduce the error you encounter. Are you having trouble understanding what this means?
Sorry, I just registered before and didn't understand the rules of githu.
I try my best to provide a good minimum compliance code. Thank you for your selfless help first! I use the example to create a class (customcnn) as the feature extractor and define the
policy_ kwargs = dict(
features_ extractor_ class=CustomCNN,
net_ arch=dict(qf=[256, 256], pi=[256, 256])
)
The following is a class I modified according to the example, because I want to make a feature extractor to extract the features of time series. I want to use CNN network.
def __init__(self, observation_space: gym.spaces.Box, features_dim: int = 1):
super(CustomCNN, self).__init__(observation_space, features_dim)
# We assume CxHxW images (channels first)
# Re-ordering will be done by pre-preprocessing or wrapper
n_input_channels = observation_space.shape[0]
self.cnn = nn.Sequential(
nn.Conv1d(self.features_dim, n_input_channels, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Conv1d(n_input_channels, self.features_dim, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Flatten(),
)
with th.no_grad():
n_flatten = self.cnn(
th.as_tensor(observation_space.sample()[None]).float()
).shape[1]
self.linear = nn.Sequential(nn.Linear(n_flatten, self.features_dim), nn.Tanh())
Now the problem is in the forword function. The first sampling of the program is the structure of [1,1,13], and the second is the structure of [1,128,13]. So I added a judgment. If the structure changes, redefine nn.sequential.
def forward(self, observations: th.Tensor) -> th.Tensor:
n_flatten = np.array(observations).shape[1]
features_dim = np.array(observations).shape[0]
print(features_dim, n_flatten)
if features_dim != 1:
self.cnn = nn.Sequential(
nn.Conv1d(features_dim, n_flatten, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Conv1d(n_flatten, features_dim, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Flatten(),
)
self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.Tanh())
return self.linear(self.cnn(observations))
The following error occurred:
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x128 and 1x256)
I know there is a full connection layer behind the feature extractor, but why [1,1,13] is OK and [1128,13] reports an error. I'm puzzled. Also, can you add a more detailed introduction to the document, about the feature extractor and the full connection layer. I don't know whether what I'm writing now meets the requirements, which has caused you trouble. Sorry.
I want to use mlppolicy policy network and CNN network as the feature extractor. I don't know whether this method is feasible, or I must customize the policy network to achieve this purpose.
I want to use mlppolicy policy network and CNN network as the feature extractor.
I think you don't have a clear understanding of policy in SB3 : the feature extractor is the first stage of any policy. You should read the documentation: SB3 Policy
I don't know whether what I'm writing now meets the requirements, which has caused you trouble. Sorry.
It doesn't. You need to provide a code, that I can just copy-paste and run to reproduce the error. WARNING: this code has to be MINIMAL: if one line can be removed without removing the error, your code is not minimal.
I also understand that you want to implement a feature extractor with Conv1D. If that's the case, you have to check the other issues that discuss this topic
Thank you for your help. I'm writing a DRL algorithm about stock portfolio return. The idea is to try to use MLP, CNN and LSTM as feature extractors to compare which is the best for financial time series.
I have found the reason for the problem of the above customized CNN feature extractor, and it has been solved. Thank you again for your help.
Excuse me, can you add another example of LSTM feature extractor to the document?
Please provide the fix so that other people can benefit from it.
Excuse me, can you add another example of LSTM feature extractor to the document?
If you think that the documentation can be improved, for example by adding more examples, for feel free to open a PR.
OK, I'll send the modified code.This is my rewriting based on the example program. I write a custom feature extractor using CNN neural network on the mlppolicy side rate network
class CustomCNN(BaseFeaturesExtractor):
"""
:param observation_space: (gym.Space)
:param features_dim: (int) Number of features extracted.
This corresponds to the number of unit for the last layer.
"""
def __init__(self, observation_space: gym.spaces.Box, features_dim: int = 1):
super(CustomCNN, self).__init__(observation_space, features_dim)
# We assume CxHxW images (channels first)
# Re-ordering will be done by pre-preprocessing or wrapper
n_input_channels = observation_space.shape[0]
self.cnn = nn.Sequential(
nn.Conv1d(self.features_dim, n_input_channels, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Conv1d(n_input_channels, self.features_dim, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Flatten(),
)
# Compute shape by doing one forward pass
with th.no_grad():
n_flatten = self.cnn(
th.as_tensor(observation_space.sample()[None]).float()
).shape[1]
#print("n_flatten", n_flatten)
##print("cnn", self.cnn)
self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.Tanh())
def forward(self, observations: th.Tensor) -> th.Tensor:
with th.no_grad():
n_flatten = np.array(observations).shape[-1]
features_dim = np.array(observations).shape[-2]
#print(features_dim, n_flatten, np.array(observations).shape)
i = 0
j = 0
if features_dim != 1:
self.cnn = nn.Sequential(
nn.Conv1d(features_dim, n_flatten, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Conv1d(n_flatten, features_dim, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Flatten(),
)
self.linear = nn.Sequential(nn.Linear(n_flatten, 1), nn.Tanh())
i += 1
else:
j += 1
self.cnn = nn.Sequential(
nn.Conv1d(features_dim, n_flatten, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Conv1d(n_flatten, features_dim, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Flatten(),
)
self.linear = nn.Sequential(nn.Linear(n_flatten, 1), nn.Tanh())
return self.linear(self.cnn(observations))
following is code in main program:
policy_kwargs = dict(
features_extractor_class=CustomCNN,
net_arch=dict(qf=[128, 128], pi=[256, 256])
)
def get_model(
self,
model_name: str,
policy: str = "MlpPolicy",
#policy: str = "MultiInputPolicy",
policy_kwargs: dict = policy_kwargs,
model_kwargs: dict = None,
verbose: int = 1
) -> Any:
# print("set Debug!")
if model_name not in MODELS:
raise NotImplementedError("NotImplementedError")
if model_kwargs is None:
model_kwargs = MODEL_KWARGS[model_name]
if "action_noise" in model_kwargs:
n_actions = self.env.action_space.shape[-1]
model_kwargs["action_noise"] = NOISE[model_kwargs["action_noise"]](
mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions)
)
print(model_kwargs)
model = MODELS[model_name](
policy=policy,
env=self.env,
tensorboard_log="{}/{}".format(config.TENSORBOARD_LOG_DIR, model_name),
verbose=verbose,
policy_kwargs=policy_kwargs,
**model_kwargs
)
return model
Excuse me, can you add another example of LSTM feature extractor to the document?
As stated in the documentation, only RecurrentPPO
(from SB3 contrib) has LSTM support.
Closing as the original question was answered.
The following is an automated answer:
as you seem to try to apply RL to stock trading, i also must warn you about it. Here is recommendation from a former professional trader:
Retail trading, retail trading with ML, and retail trading with RL are bad ideas for almost everyone to get involved with.
Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
📚 Documentation
The problem of off policy network has been bothering me for several days. I use the example to create a class (customcnn) as the feature extractor and define the policy kwargs = dict( features extractor class=CustomCNN, net arch=dict(qf=[256, 256], pi=[256, 256]) ) CNN neural network is used as the feature extractor, and the code is as follows:
The following is a class I modified according to the example, because I want to make a feature extractor to extract the features of time series. I want to use CNN network.
Now the problem is in the forword function. The first sampling of the program is the structure of [1,1,13], and the second is the structure of [1,128,13]. So I added a judgment. If the structure changes, redefine nn.sequential.
Error tracking is:
Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\pydevd.py", line 1491, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "C:\Program Files\JetBrains\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:/Users/Administrator/PycharmProjects/demo/utils/models.py", line 487, in
trained_sac = agent.train_model(
File "C:/Users/Administrator/PycharmProjects/demo/utils/models.py", line 409, in train_model
model = model.learn(total_timesteps=total_timesteps, tb_log_name=tb_log_name)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\sac.py", line 292, in learn
return super(SAC, self).learn(
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 366, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\sac.py", line 206, in train
actions_pi, log_prob = self.actor.action_log_prob(replay_data.observations)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 180, in action_log_prob
mean_actions, log_std, kwargs = self.get_action_dist_params(obs)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 163, in get_action_dist_params
latent_pi = self.latent_pi(features)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
input = module(input)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(input, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x128 and 1x256)
I know there is a full connection layer behind the feature extractor, but why [1,1,13] is OK and [1128,13] reports an error. I'm puzzled. Also, can I add a more detailed introduction to the document, about the feature extractor and the full connection layer. I don't know if my analysis is correct. Please help me have a look. Thank you very much!!! I'll send you some more code of the model,
policy_kwargs = dict( features_extractor_class=CustomCNN, net_arch=dict(qf=[256, 256], pi=[256, 256]) ) def get_model( self, model_name: str, policy: str = "MlpPolicy",
policy: str = "MultiInputPolicy",
if name == "main": from pull_data import Pull_data from preprocessors import FeatureEngineer, split_data from utils import config import time
pull data
A clear and concise description of what should be improved in the documentation.
 Checklist