facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.
MIT License
2.57k stars 155 forks source link

cuda out of meomory #72

Closed lz1159435992 closed 4 months ago

lz1159435992 commented 6 months ago

image During the use of Pearl, the consumption of VRAM keeps increasing continuously. Is there any way to delete tensors that are no longer needed, or is running out of memory inevitable due to my large action space?

yiwan-rl commented 6 months ago

"the consumption of VRAM keeps increasing continuously" This might be a result of putting more and more data in the replay buffer. You may try a small replay buffer and see if the memory increases after the replay buffer is full to verify this point.

Also, could you provide more details regarding your experiment?

lz1159435992 commented 6 months ago

In my experiment, the state is a file in encoded smt-lib format, and the action consists of a variable and a constant. The action space is the product of the variable list and the constant list. The size of the variable list depends on the number of variables in the smt file, while the size of the constant list is 30,000. My code is based on the tutorial code, and there must be some foolish code in it. If I have any misuse, I hope you can point it out. This is the code for constructing my agent. When my action space is small, the memory usage isn't high. Now, my action space is 30000*n (n=1, 2, 3...), and the method can only run for about half an hour before it's forced to stop. In the code I wrote, all tensors created are deleted after use. I'm not sure if it's because my action space is too large or if I'm using the wrong class for the action space?

env = ConstraintSimplificationEnv_v3(embedder, assertions, len(variables), len(variables),smtlib_str)
observation, action_space = env.reset()
action_representation_module = IdentityActionRepresentationModule(
    max_number_actions=action_space.n,
    representation_dim=action_space.action_dim,
)

agent = PearlAgent(
    policy_learner=SoftActorCritic(
            state_dim=768,
            action_space=action_space,
            actor_hidden_dims=[768, 512, 128],
            critic_hidden_dims=[768, 512, 128],
            action_representation_module=action_representation_module,
    ),
    history_summarization_module=LSTMHistorySummarizationModule(
        observation_dim=768,
        action_dim=len(env.variables)+1,
        hidden_dim=768,
        history_length=len(env.variables),  
    ),
    replay_buffer=BootstrapReplayBuffer(10_000, 1.0, 5),
    device_id=-1,
)
info = online_learning(
    agent=agent,
    env=env,
    number_of_steps=number_of_steps,
    print_every_x_steps=100,
    record_period=record_period,
    learn_after_episode=True,
)
torch.save(info["return"], "BootstrappedDQN-LSTM-return.pt")
plt.plot(record_period * np.arange(len(info["return"])), info["return"], label="BootstrappedDQN-LSTM")
plt.legend()
plt.show()

there is the code for env:

class ConstraintSimplificationEnv_v3(Environment):
    def __init__(self,embedder, z3ast, num_variables, num_constants,smtlib_str):
        self.actions_v = None
        self.embedder = embedder
        self.z3ast = z3ast
        self.z3ast_original = z3ast
        self.num_variables = num_variables
        self.num_constants = num_constants
        self.smtlib_str = smtlib_str
        self.state = None
        self.variables = set()
        self.actions = []
        self.concrete_finish = False
        self.concrete_count = 0
        self.counterexamples_list = []
        self.finish = False
        self.used_variables = []
        self.state_count = 0
        self.predictor = Predictor('KNN')

    def reset(self, seed=None):
        self.concrete_finish = False
        self.concrete_count = 0
        self.finish = False
        self.used_variables = []
        self.state = self.embedder.get_max_pooling_embedding(self.smtlib_str)
        self.z3ast = self.z3ast_original
        self.variables = extract_variables_from_smt2_content(self.smtlib_str)
        self.actions_v = self.strings_to_onehot(self.variables)

        self.actions = get_actions(self.actions_v, torch.arange(0, len(dict_value)-1))
        self.actions.to(device)
        self.action_space = DiscreteActionSpace(self.actions)
        del self.actions
        torch.cuda.empty_cache()
        return self.state, self.action_space
    def step(self, action):
        reward = 0
        # variable_pred = self.variables[action]
        # action = self.action_space.
        action = self.action_space.actions_batch[action]
        action_v = action[:-1]
        action_n = action[-1]
        variable_pred = self.variables[self.onehot_to_indices(action_v)[0]]
        if self.concrete_count == 0:
            self.counterexamples_list.append([])
        if variable_pred not in self.used_variables:
            self.used_variables.append(variable_pred)
            self.concrete_count += 1
            selected_int = int(dict_value[str(int(action_n.item()))])
            self.counterexamples_list[-1].append([variable_pred, selected_int])

            solver = Solver()
            for a in self.z3ast:
                solver.add(a)
            v_name = 'v_name'
            exec(f"{v_name} = Int('{variable_pred}')")
            solver.add(eval(v_name) == selected_int)
            reward += self.calculate_reward(solver)
            self.z3ast = solver.assertions()
            self.state = self.embedder.get_max_pooling_embedding(solver.to_smt2())

            if self.concrete_count == len(self.variables):
                self.concrete_finish = True
                self.reset()
        else:
            reward += -10
            print(action)
            device = torch.device("cpu")
            self.actions_v = [act.to(device) for act in self.actions_v]
            action_v = [act.to(device) for act in action_v]
            # print(self.actions)
            for i in self.actions_v:
                i.to(device)
            for i in action_v:
                i.to(device)
            # action_v.to(device)
            self.actions_v = [tensor1 for tensor1 in self.actions_v if not any(torch.equal(tensor1, tensor2) for tensor2 in action_v)]
            self.action_space = DiscreteActionSpace(get_actions(self.actions_v, torch.arange(0, len(dict_value)-1)))
        del action
        del action_n
        del action_v
        torch.cuda.empty_cache()
        return ActionResult(
            observation=self.state,
            reward=float(reward),
            terminated=self.finish,
            truncated=False,
            info={},
            available_action_space=self.action_space, )

    @staticmethod
    def strings_to_onehot(string_list):
        str_to_index = {string: index for index, string in enumerate(string_list)}
        one_hot_tensors = []
        for string in string_list:
            one_hot_vector = torch.zeros(len(string_list), dtype=torch.float32)
            one_hot_vector[str_to_index[string]] = 1.0
            one_hot_vector.to(device)
            one_hot_tensors.append(one_hot_vector)
        one_hot_matrix = torch.stack(one_hot_tensors)
        del one_hot_vector
        del one_hot_tensors
        torch.cuda.empty_cache()
        return one_hot_matrix
        # return one_hot_tensors

    @staticmethod
    def onehot_to_indices(one_hot_tensors):
        return [torch.argmax(tensor).item() for tensor in one_hot_tensors]

    @staticmethod
    def counter_reward_function(total_length, unique_count):
        # Define the base reward values
        R_positive = 1
        R_negative = -1

        # Define the scaling factor for negative reward
        alpha = 1 / math.sqrt(total_length) if total_length > 0 else 1

        # Check if there are any unique strings
        if unique_count > 0:
            # Calculate the positive reward, scaled based on the list length
            reward = R_positive / math.log(1 + total_length)
        else:
            # Apply the negative reward, scaled by alpha
            reward = R_negative * alpha

        return reward

    def calculate_reward(self, solver):
        reward = 0
        count = 0
        solver.set("timeout", 60000)
        if len(self.counterexamples_list) > 1:
            if self.counterexamples_list[-1] in self.counterexamples_list[:len(self.counterexamples_list) - 1]:
                reward += -1
            else:
                last_joined = ' '.join(
                    ' '.join(str(item) for item in inner_list) for inner_list in self.counterexamples_list[-1])
                for i in range(len(self.counterexamples_list) - 1):
                    current_joined = ' '.join(
                        ' '.join(str(item) for item in inner_list) for inner_list in self.counterexamples_list[i])
                    if last_joined in current_joined:
                        count += 1
                reward += self.counter_reward_function(len(self.counterexamples_list) - 1,
                                                       len(self.counterexamples_list) - 1 - count)
            print(self.counterexamples_list)
        query_smt2 = solver.to_smt2()
        # print(query_smt2)
        predicted_solvability = self.predictor.predict(query_smt2)
        if predicted_solvability == 0:
            reward += 2
            r = solver.check()
            stats = solver.statistics()
            if z3.sat == r:

                self.finish = True

                print("求解时间:", stats.get_key_value('time'))
                update_txt_with_current_time('time.txt',self.smtlib_str,stats.get_key_value('time'))
            else:
                # reward += 1 / stats.get_key_value('time') * 100
                reward += -5

        return reward

    def are_lists_equal(self, list1, list2):
        if len(list1) != len(list2):
            return False

        for item1, item2 in zip(list1, list2):
            if item1 != item2:
                return False
        return True
rodrigodesalvobraz commented 6 months ago

It's hard to find the problem because there is too much code. When submitting code online with a bug, it is recommended to first try to reduce the code to the smallest size that still reproduces the problem, by increasingly simplifying it. In fact, the process of reducing the code in this way often ends up revealing the bug before one even needs to ask.

You say the action space grows, but I see it depends on dict_value, whose definition I cannot find anywhere. In any case, the fact that the number of actions keeps growing sounds like a likely cause for running out of memory.

Note that it might also be helpful to use the action representation module to use action embedding rather than a large discrete space. To do that you could try using an action representation module that computes the embedding using a neural network.

lz1159435992 commented 6 months ago

I'm sorry for posting so much code, even though I've already cut down a part. The reason I posted all the related code is that I haven't pinpointed where the bug truly exists. Additionally, I was unclear in my explanation. When I mentioned "action space grows," it does not mean that it increases during runtime, but rather it refers to the parameters set in my experiment. Currently, dict_value has 27,000 values, and when multiplied by the number of variables, it equals the size of the action space. This size remains constant throughout each run. You mentioned that the "action representation module" might alleviate this problem, but I'm still unsure about how to implement it. Are there any documents that could teach me how to do this? Thank you very much again for your response.

rodrigodesalvobraz commented 6 months ago

Do you think you can create a GitHub project of your code so I can clone it and run it myself to see if I understand it better?

yiwan-rl commented 6 months ago

You said that your action space size is 30000*n. Do you know how large is n?

It is possible that the model is simply too large. In your code, you used the history summarization module, which involves a history of length len(env.variables). Do you know how long is this length?

I would recommend starting from the simplest model (like removing this history summarization module) and run it and check its memory usage via "nvidia-smi", and then increase your model size gradually to obtain better performance.

Btw, given that you were using sac as your policy learner, you should use the FIFOOffPolicyReplayBuffer instead of the bootstrapped replay buffer.

rodrigodesalvobraz commented 6 months ago

I just realized that one problem with your code is that you are creating an action space with one-hot representation actions, which, given the large number of actions, are going to be very memory-expensive. The actions are stored in the replay buffer batches, and are each 27,000-long tensors.

The best thing to do is to use an action space with 27,000 integers (each action is an integer) and use a one-hot representation module (see how to use it in the Frozen Lake tutorial). This will lead to each action being represented by a single integer, which is of course much more memory efficient. These integers will only be transformed into one-hot representation at the time of evaluation of the neural network.

This should help with memory usage.

Still willing to look at your code if you create a repo I can clone.

lz1159435992 commented 6 months ago

You said that your action space size is 30000*n. Do you know how large is n?

It is possible that the model is simply too large. In your code, you used the history summarization module, which involves a history of length len(env.variables). Do you know how long is this length?

I would recommend starting from the simplest model (like removing this history summarization module) and run it and check its memory usage via "nvidia-smi", and then increase your model size gradually to obtain better performance.

Btw, given that you were using sac as your policy learner, you should use the FIFOOffPolicyReplayBuffer instead of the bootstrapped replay buffer.

The size of n and len(env.variables) are the same, depending on the file I input. Based on my experience, they generally range between 1 to 30. I will try to simplify and modify the model (history summarization module) to see the effect. Thank you very much for your suggestion.

lz1159435992 commented 6 months ago

I just realized that one problem with your code is that you are creating an action space with one-hot representation actions, which, given the large number of actions, are going to be very memory-expensive. The actions are stored in the replay buffer batches, and are each 27,000-long tensors.

The best thing to do is to use an action space with 27,000 integers (each action is an integer) and use a one-hot representation module (see how to use it in the Frozen Lake tutorial). This will lead to each action being represented by a single integer, which is of course much more memory efficient. These integers will only be transformed into one-hot representation at the time of evaluation of the neural network.

This should help with memory usage.

Still willing to look at your code if you create a repo I can clone.

This is my repo; I forked a version of pearl and made modifications based on it, but my code is somewhat messy. The main files are under the test_rl folder, with embedding_test2_v3 and env being the primary files. Also, I will follow your advice and use a one-hot representation module. Thank you very much for your help.

lz1159435992 commented 6 months ago

You said that your action space size is 30000*n. Do you know how large is n? It is possible that the model is simply too large. In your code, you used the history summarization module, which involves a history of length len(env.variables). Do you know how long is this length? I would recommend starting from the simplest model (like removing this history summarization module) and run it and check its memory usage via "nvidia-smi", and then increase your model size gradually to obtain better performance. Btw, given that you were using sac as your policy learner, you should use the FIFOOffPolicyReplayBuffer instead of the bootstrapped replay buffer.

The size of n and len(env.variables) are the same, depending on the file I input. Based on my experience, they generally range between 1 to 30. I will try to simplify and modify the model (history summarization module) to see the effect. Thank you very much for your suggestion.

After I modified the bootstrapped replay buffer to FIFOOffPolicyReplayBuffer, the issue of exploding VRAM indeed no longer occurs. However, a new problem has arisen. About half an hour into running, I encounter this issue every time without any error messages. I've been monitoring both the RAM and VRAM, and they are within normal ranges. I'm at a loss as to where the problem lies. image

rodrigodesalvobraz commented 6 months ago

Do you see a stack trace showing where the error occurs? Is there any other part of the error message?

Googling "11:SIGSEGV pytorch" reveals several pages discussing this error. Some of them mention the need to update Nvidia drivers. It seems to be a fairly low-level issue. Can you take a look at those pages to see if they help?

lz1159435992 commented 6 months ago

No, I haven't seen any error messages, and when I execute step by step, I can't accurately locate the position where the code fails. I'm trying my best to find the source of the problem, but so far, I've been unable to.
In fact, I'm not very keen on upgrading my NVIDIA driver because a previous upgrade once caused my system to crash. I'll first try to replicate the error on another machine, and then I'll consider upgrading the driver.

rodrigodesalvobraz commented 4 months ago

We have just committed changes that keep the replay buffer on the CPU and sampled batches on the GPU. Before, the entire replay buffer was being kept in the GPU. Perhaps this solves this problem. Please let us know if not.

lz1159435992 commented 3 months ago

I am still encountering issues with insufficient memory, but mainly because my action space is too large. In previous versions, I didn't have this problem after I reduced the action space. I have tried on the updated version, and the issue of insufficient memory still occurs when my action space is large, so I am not quite sure if there is still a problem.

rodrigodesalvobraz commented 3 months ago

Do you still see the "11:SIGSEGV pytorch" error?

lz1159435992 commented 3 months ago

No, I haven't encountered this error after the update.