Closed Altina-oz closed 1 year ago
let me have a look
Hi,
NHP/THP/SAHP/RMTPP model are fixed for running on cuda. Please use the most updated master branch.
There may exist some problems for other models. When i fix everything, i will publish a new version of package.
I have checked other models and they seem work fine.
Thank you very much,Hope everything goes smoothly 🙏Many thanks for your workMou---Original---From: @.>Date: Thu, Sep 21, 2023 11:21 AMTo: @.>;Cc: "Altina @.>@.>;Subject: Re: [ant-research/EasyTemporalPointProcess] cannot run on gpu withconfigs and data in the example and docs (but work well on cpu backend) (Issue#5) I have checked other models and they seem work fine.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
You are welcome. Let us know if there are any other problems. We will continue to improve the code.
my main.py: import argparse
from easy_tpp.config_factory import Config from easy_tpp.runner import Runner
def main(): parser = argparse.ArgumentParser()
if name == 'main': main()
Error messege:
2023-09-05 22:06:34,176 - config.py[pid:552;line:33:build_from_yaml_file] - CRITICAL: Load pipeline config class RunnerConfig 2023-09-05 22:06:34,197 - runner_config.py[pid:552;line:163:update_config] - CRITICAL: train model NHP using GPU with torch backend 2023-09-05 22:06:34,200 - runner_config.py[pid:552;line:36:init] - INFO: Save the config to ./checkpoints/552_19516_230905-220634\NHP_train_output.yaml 2023-09-05 22:06:34,201 - base_runner.py[pid:552;line:170:save_log] - INFO: Save the log to ./checkpoints/552_19516_230905-220634\log 2023-09-05 22:06:34,483 - tpp_runner.py[pid:552;line:60:_init_model] - INFO: Num of model parameters 60672 2023-09-05 22:06:35,077 - base_runner.py[pid:552;line:92:train] - INFO: Data 'taxi' loaded... 2023-09-05 22:06:35,078 - base_runner.py[pid:552;line:97:train] - INFO: Start NHP training... Traceback (most recent call last): File "E:\isomorphism\ntpp\main.py", line 25, in
main()
File "E:\isomorphism\ntpp\main.py", line 21, in main
model_runner.run()
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\runner\base_runner.py", line 191, in run
return self.train(*kwargs)
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\runner\base_runner.py", line 98, in train
model = self._train_model(
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\runner\tpp_runner.py", line 93, in _train_model
train_metrics = self.run_one_epoch(train_loader, RunnerPhase.TRAIN)
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\runner\tpp_runner.py", line 188, in run_one_epoch
self.model_wrapper.run_batch(batch, phase=phase)
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\torch_wrapper.py", line 116, in run_batch
loss, num_event = self.model.loglike_loss(batch)
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\model\torch_model\torch_nhp.py", line 237, in loglike_loss
hiddens_ti, decay_states = self.forward(batch)
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\model\torch_model\torch_nhp.py", line 192, in forward
self.rnn_cell(x_t, h_t, c_t, c_bar_i)
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\model\torch_model\torch_nhp.py", line 54, in forward
xi = torch.cat((x_i, hidden_i_minus), dim=1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)