cannot run on gpu with configs and data in the example and docs (but work well on cpu backend)

Altina-oz commented 1 year ago

my main.py: import argparse

from easy_tpp.config_factory import Config from easy_tpp.runner import Runner

def main(): parser = argparse.ArgumentParser()

parser.add_argument('--config_dir', type=str, required=False, default='configs/experiment_config.yaml',
                    help='Dir of configuration yaml to train and evaluate the model.')

parser.add_argument('--experiment_id', type=str, required=False, default='NHP_train',
                    help='Experiment id in the config file.')

args = parser.parse_args()

config = Config.build_from_yaml_file(args.config_dir, experiment_id=args.experiment_id)

model_runner = Runner.build_from_config(config)

model_runner.run()

if name == 'main': main()

Error messege:

2023-09-05 22:06:34,176 - config.py[pid:552;line:33:build_from_yaml_file] - CRITICAL: Load pipeline config class RunnerConfig 2023-09-05 22:06:34,197 - runner_config.py[pid:552;line:163:update_config] - CRITICAL: train model NHP using GPU with torch backend 2023-09-05 22:06:34,200 - runner_config.py[pid:552;line:36:init] - INFO: Save the config to ./checkpoints/552_19516_230905-220634\NHP_train_output.yaml 2023-09-05 22:06:34,201 - base_runner.py[pid:552;line:170:save_log] - INFO: Save the log to ./checkpoints/552_19516_230905-220634\log 2023-09-05 22:06:34,483 - tpp_runner.py[pid:552;line:60:_init_model] - INFO: Num of model parameters 60672 2023-09-05 22:06:35,077 - base_runner.py[pid:552;line:92:train] - INFO: Data 'taxi' loaded... 2023-09-05 22:06:35,078 - base_runner.py[pid:552;line:97:train] - INFO: Start NHP training... Traceback (most recent call last): File "E:\isomorphism\ntpp\main.py", line 25, in main() File "E:\isomorphism\ntpp\main.py", line 21, in main model_runner.run() File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\runner\base_runner.py", line 191, in run return self.train(*kwargs) File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\runner\base_runner.py", line 98, in train model = self._train_model( File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\runner\tpp_runner.py", line 93, in _train_model train_metrics = self.run_one_epoch(train_loader, RunnerPhase.TRAIN) File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\runner\tpp_runner.py", line 188, in run_one_epoch self.model_wrapper.run_batch(batch, phase=phase) File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\torch_wrapper.py", line 116, in run_batch loss, num_event = self.model.loglike_loss(batch) File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\model\torch_model\torch_nhp.py", line 237, in loglike_loss hiddens_ti, decay_states = self.forward(batch) File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\model\torch_model\torch_nhp.py", line 192, in forward self.rnn_cell(x_t, h_t, c_t, c_bar_i) File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "C:\ProgramData\anaconda3\envs\graphL\lib\site-packages\easy_tpp\model\torch_model\torch_nhp.py", line 54, in forward xi = torch.cat((x_i, hidden_i_minus), dim=1) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)

iLampard commented 1 year ago

let me have a look

iLampard commented 1 year ago

Hi,

NHP/THP/SAHP/RMTPP model are fixed for running on cuda. Please use the most updated master branch.

There may exist some problems for other models. When i fix everything, i will publish a new version of package.

iLampard commented 1 year ago

I have checked other models and they seem work fine.

Altina-oz commented 1 year ago

Thank you very much,Hope everything goes smoothly 🙏Many thanks for your workMou---Original---From: @.>Date: Thu, Sep 21, 2023 11:21 AMTo: @.>;Cc: "Altina @.>@.>;Subject: Re: [ant-research/EasyTemporalPointProcess] cannot run on gpu withconfigs and data in the example and docs (but work well on cpu backend) (Issue#5) I have checked other models and they seem work fine.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

iLampard commented 1 year ago

You are welcome. Let us know if there are any other problems. We will continue to improve the code.

ant-research / EasyTemporalPointProcess

cannot run on gpu with configs and data in the example and docs (but work well on cpu backend) #5