loglike is nan - Githubissues

rvandewater commented 11 months ago

When running python benchmark_script.py with just the retweet dataset config.yml, I get the following:

2023-11-08 21:07:36,542 - config.py[pid:1068266;line:33:build_from_yaml_file] - CRITICAL: Load pipeline config class RunnerConfig
2023-11-08 21:07:36,543 - runner_config.py[pid:1068266;line:164:update_config] - CRITICAL: train model RMTPP using CPU with torch backend
2023-11-08 21:07:36,550 - runner_config.py[pid:1068266;line:36:__init__] - INFO: Save the config to ./checkpoints/1068266_140147847926144_231108-210736/RMTPP_train_output.yaml
2023-11-08 21:07:36,551 - base_runner.py[pid:1068266;line:170:save_log] - INFO: Save the log to ./checkpoints/1068266_140147847926144_231108-210736/log
2023-11-08 21:07:37,740 - tpp_runner.py[pid:1068266;line:60:_init_model] - INFO: Num of model parameters 2403
2023-11-08 21:07:46,185 - base_runner.py[pid:1068266;line:92:train] - INFO: Data 'retweet' loaded...
2023-11-08 21:07:46,186 - base_runner.py[pid:1068266;line:97:train] - INFO: Start RMTPP training...
2023-11-08 21:07:56,977 - tpp_runner.py[pid:1068266;line:96:_train_model] - INFO: [ Epoch 0 (train) ]: train loglike is nan, num_events is 2156116
2023-11-08 21:08:07,580 - tpp_runner.py[pid:1068266;line:107:_train_model] - INFO: [ Epoch 0 (valid) ]:  valid loglike is nan, num_events is 213521, acc is 0.4964336060621672, rmse is 8759.48165286368
2023-11-08 21:08:18,261 - tpp_runner.py[pid:1068266;line:122:_train_model] - INFO: [ Epoch 0 (test) ]: test loglike is nan, num_events is 216465, acc is 0.49082761647379486, rmse is 8906.668534393926
2023-11-08 21:08:18,261 - tpp_runner.py[pid:1068266;line:124:_train_model] - CRITICAL: current best loglike on valid set is -179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0000 (updated at epoch-NeverUpdated)
2023-11-08 21:08:28,457 - tpp_runner.py[pid:1068266;line:96:_train_model] - INFO: [ Epoch 1 (train) ]: train loglike is nan, num_events is 2156116

rvandewater commented 11 months ago

Addendum: the training uses the cpu and not the GPU even though torch.cuda.is_available() outputs True

iLampard commented 11 months ago

let me have a look at this.

Maple728 commented 11 months ago

@rvandewater Can you provide the config.yaml?

iLampard commented 11 months ago

Addendum: the training uses the cpu and not the GPU even though torch.cuda.is_available() outputs True

Hi, by default we use cpu. please show us what config you are using to run the model.

rvandewater commented 11 months ago

Here you go, @Maple728 and @iLampard. It contains a lot of commented-out lines because I did not have access to anything but the retweet dataset at the time.

pipeline_config_id: runner_config

data:
#  taxi:
#    data_format: pkl
#    train_dir: ./data/taxi/train.pkl
#    valid_dir: ./data/taxi/dev.pkl
#    test_dir: ./data/taxi/test.pkl
#    data_specs:
#      num_event_types: 10
#      pad_token_id: 10
#      padding_side: right
##      padding_strategy: max_length
##      truncation_strategy: longest_first # or Truncate to a maximum length specified with the argument `max_length`
##      max_len: 20
#  conttime:
#    data_format: pkl
#    train_dir: ../data/conttime/train.pkl
#    valid_dir: ../data/conttime/dev.pkl
#    test_dir: ../data/conttime/test.pkl
#    data_specs:
#      num_event_types: 5
#      pad_token_id: 5
#      padding_side: right
#      truncation_side: right
##      padding_strategy: max_length  # for ode tpp we have to set this to max_length
##      max_len: 20
#  hawkes_1d:
#    data_format: pkl
#    train_dir: ../data/hawkes/train.pkl
#    valid_dir: ../data/hawkes/dev.pkl
#    test_dir: ../data/hawkes/test.pkl
#    data_specs:
#      num_event_types: 1
#      pad_token_id: 1
#      padding_side: right
#      truncation_side: right
  retweet:
    data_format: pkl
#    train_dir: C:\Users\Robin\Documents\Git\EasyTemporalPointProcess\examples\data\retweet\train.pkl #../data/retweet/train.pkl
#    valid_dir: C:\Users\Robin\Documents\Git\EasyTemporalPointProcess\examples\data\retweet\dev.pkl #../data/retweet/dev.pkl
#    test_dir: C:\Users\Robin\Documents\Git\EasyTemporalPointProcess\examples\data\retweet\test.pkl #../data/retweet/test.pkl
    train_dir: /dhc/home/robin.vandewater/projects/easyTPP/examples/data/retweet/train.pkl
    valid_dir: /dhc/home/robin.vandewater/projects/easyTPP/examples/data/retweet/dev.pkl
    test_dir: /dhc/home/robin.vandewater/projects/easyTPP/examples/data/retweet/test.pkl
    data_specs:
      num_event_types: 3
      pad_token_id: 3
      padding_side: right
      truncation_side: right

#
#RMTPP_train:
#  base_config:
#    stage: train
#    backend: torch
#    dataset_id: retweet #taxi
#    runner_id: std_tpp
#    model_id: RMTPP # model name
#    base_dir: './checkpoints/'
#  trainer_config:
#    batch_size: 256
#    max_epoch: 20
#    shuffle: False
#    optimizer: adam
#    learning_rate: 1.e-3
#    valid_freq: 1
#    use_tfb: False
#    metrics: [ 'acc', 'rmse' ]
#    seed: 2019
#    gpu: -1
#  model_config:
#    hidden_size: 32
#    time_emb_size: 16
#    num_layers: 2
#    num_heads: 2
#    mc_num_sample_per_step: 20
#    sharing_param_layer: False
#    loss_integral_num_sample_per_step: 20
#    dropout: 0.0
#    use_ln: False
#    thinning:
#      num_seq: 10
#      num_sample: 1
#      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
#      look_ahead_time: 10
#      patience_counter: 5 # the maximum iteration used in adaptive thinning
#      over_sample_rate: 5
#      num_samples_boundary: 5
#      dtime_max: 5
#
#
#
#RMTPP_eval:
#  stage: eval
#  backend: torch
#  dataset_id: conttime
#  runner_id: std_tpp
#  base_config:
#    base_dir: './checkpoints/'
#    batch_size: 256
#    max_epoch: 10
#    shuffle: False
#    valid_freq: 1
#    use_tfb: False
#    metrics: [ 'acc', 'rmse' ]
#  model_config:
#    model_id: RMTPP # model name
#    hidden_size: 32
#    time_emb_size: 16
#    num_layers: 2
#    num_heads: 2
#    mc_num_sample_per_step: 20
#    sharing_param_layer: False
#    loss_integral_num_sample_per_step: 20
#    dropout: 0.0
#    use_ln: False
#    seed: 2019
#    gpu: 0
#    pretrained_model_dir: ./checkpoints/59618_4339156352_221128-142905/models/saved_model
#    thinning:
#      num_seq: 10
#      num_sample: 1
#      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
#      look_ahead_time: 10
#      patience_counter: 5 # the maximum iteration used in adaptive thinning
#      over_sample_rate: 5
#      num_samples_boundary: 5
#      dtime_max: 5

RMTPP_gen:
  base_config:
    stage: gen
    backend: torch
    dataset_id: retweet
    runner_id: std_tpp
    base_dir: './checkpoints/'
    model_id: RMTPP
  model_config:
    hidden_size: 32
    time_emb_size: 16
    mc_num_sample_per_step: 20
    sharing_param_layer: False
    loss_integral_num_sample_per_step: 20
    dropout: 0.0
    use_ln: False
    seed: 2019
    gpu: 0
    pretrained_model_dir: ./checkpoints/2555_4348724608_230603-155841/models/saved_model
    thinning:
      num_seq: 10
      num_sample: 1
      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
      look_ahead_time: 10
      patience_counter: 5 # the maximum iteration used in adaptive thinning
      over_sample_rate: 5
      num_samples_boundary: 5
      dtime_max: 5
      num_step_gen: 10

#NHP_gen:
#  base_config:
#    stage: gen
#    backend: tf
#    dataset_id: taxi
#    runner_id: std_tpp
#    base_dir: './checkpoints/'
#    model_id: NHP
#  trainer_config:
#    batch_size: 256
#    max_epoch: 1
#  model_config:
#    hidden_size: 64
#    mc_num_sample_per_step: 20
#    sharing_param_layer: False
#    loss_integral_num_sample_per_step: 20
#    dropout: 0.0
#    use_ln: False
#    seed: 2019
#    gpu: 0
#    pretrained_model_dir: ./checkpoints/6934_4375315840_230603-222826/models/saved_model
#    thinning:
#      num_seq: 10
#      num_sample: 1
#      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
#      look_ahead_time: 10
#      patience_counter: 5 # the maximum iteration used in adaptive thinning
#      over_sample_rate: 5
#      num_samples_boundary: 5
#      dtime_max: 5
#      num_step_gen: 10

FullyNN_train:
  base_config:
    stage: train
    backend: torch
    dataset_id: retweet #taxi
    runner_id: std_tpp
    model_id: FullyNN # model name
    base_dir: './checkpoints/'
  trainer_config:
    batch_size: 256
    max_epoch: 200
    shuffle: False
    optimizer: adam
    learning_rate: 1.e-3
    valid_freq: 1
    use_tfb: False
    metrics: [ 'acc', 'rmse' ]
    seed: 2019
    gpu: 0
  model_config:
    rnn_type: LSTM
    hidden_size: 32
    time_emb_size: 4
    num_layers: 2
    num_heads: 2
    mc_num_sample_per_step: 20
    sharing_param_layer: False
    loss_integral_num_sample_per_step: 20
    dropout: 0.0
    use_ln: False
    model_specs:
      num_mlp_layers: 3
#    thinning:
#      num_seq: 10
#      num_sample: 1
#      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
#      look_ahead_time: 10
#      patience_counter: 5 # the maximum iteration used in adaptive thinning
#      over_sample_rate: 5
#      num_samples_boundary: 5
#      dtime_max: 5
#      num_step_gen: 1

#IntensityFree_train:
#  base_config:
#    stage: train
#    backend: torch
#    dataset_id: taxi
#    runner_id: std_tpp
#    model_id: IntensityFree # model name
#    base_dir: './checkpoints/'
#  trainer_config:
#    batch_size: 256
#    max_epoch: 200
#    shuffle: False
#    optimizer: adam
#    learning_rate: 1.e-3
#    valid_freq: 1
#    use_tfb: False
#    metrics: [ 'acc', 'rmse' ]
#    seed: 2019
#    gpu: 0
#  model_config:
#    hidden_size: 32
#    time_emb_size: 16
#    num_layers: 2
#    num_heads: 2
#    mc_num_sample_per_step: 20
#    sharing_param_layer: False
#    loss_integral_num_sample_per_step: 20
#    dropout: 0.0
#    use_ln: False
#    model_specs:
#      num_mix_components: 3

#ODETPP_train:
#  base_config:
#    stage: train
#    backend: torch
#    dataset_id: taxi
#    runner_id: std_tpp
#    model_id: ODETPP # model name
#    base_dir: './checkpoints/'
#  trainer_config:
#    batch_size: 32
#    max_epoch: 200
#    shuffle: False
#    optimizer: adam
#    learning_rate: 1.e-1
#    valid_freq: 1
#    use_tfb: False
#    metrics: [ 'acc', 'rmse' ]
#    seed: 2019
#    gpu: -1
#  model_config:
#    hidden_size: 4
#    time_emb_size: 4
#    num_layers: 1
#    sharing_param_layer: False
#    loss_integral_num_sample_per_step: 20
#    dropout: 0.0
#    use_ln: False
#    model_specs:
#      ode_num_sample_per_step: 2
#      time_factor: 100
#    thinning:
#      num_seq: 10
#      num_sample: 1
#      num_exp: 50 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
#      look_ahead_time: 10
#      patience_counter: 5 # the maximum iteration used in adaptive thinning
#      over_sample_rate: 5
#      num_samples_boundary: 5
#      dtime_max: 5
#      num_step_gen: 1

ODETPP_gen:
  base_config:
    stage: gen
    backend: torch
    dataset_id: retweet
    runner_id: std_tpp
    base_dir: './checkpoints/'
    model_id: ODETPP
  trainer_config:
    batch_size: 256
    max_epoch: 1
  model_config:
    hidden_size: 32
    time_emb_size: 16
    num_layers: 1
    sharing_param_layer: False
    loss_integral_num_sample_per_step: 20
    dropout: 0.0
    use_ln: False
    seed: 2019
    gpu: 0
    pretrained_model_dir: ./checkpoints/3538_4310828416_230603-165911/models/saved_model
    model_specs:
      ode_num_sample_per_step: 2
      time_factor: 100
    thinning:
      num_seq: 10
      num_sample: 1
      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
      look_ahead_time: 10
      patience_counter: 5 # the maximum iteration used in adaptive thinning
      over_sample_rate: 5
      num_samples_boundary: 5
      dtime_max: 5
      num_step_gen: 10

NHP_train:
  base_config:
    stage: train
    backend: torch
    dataset_id: retweet #taxi
    runner_id: std_tpp
    model_id: NHP # model name
    base_dir: './checkpoints/'
  trainer_config:
    batch_size: 256
    max_epoch: 20
    shuffle: False
    optimizer: adam
    learning_rate: 1.e-3
    valid_freq: 1
    use_tfb: False
    metrics: [ 'acc', 'rmse' ]
    seed: 2019
    gpu: -1
  model_config:
    hidden_size: 64
    loss_integral_num_sample_per_step: 20
#    pretrained_model_dir: ./checkpoints/75518_4377527680_230530-132355/models/saved_model
    thinning:
      num_seq: 10
      num_sample: 1
      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
      look_ahead_time: 10
      patience_counter: 5 # the maximum iteration used in adaptive thinning
      over_sample_rate: 5
      num_samples_boundary: 5
      dtime_max: 5
      num_step_gen: 1

#SAHP_train:
#  base_config:
#    stage: train
#    backend: torch
#    dataset_id: taxi
#    runner_id: std_tpp
#    model_id: SAHP # model name
#    base_dir: './checkpoints/'
#  trainer_config:
#    batch_size: 256
#    max_epoch: 20
#    shuffle: False
#    optimizer: adam
#    learning_rate: 1.e-3
#    valid_freq: 1
#    use_tfb: False
#    metrics: [ 'acc', 'rmse' ]
#    seed: 2019
#    gpu: -1
#  model_config:
#    hidden_size: 32
#    time_emb_size: 16
#    num_layers: 2
#    num_heads: 2
#    loss_integral_num_sample_per_step: 20
#    use_ln: False
#    thinning:
#      num_seq: 10
#      num_sample: 1
#      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
#      look_ahead_time: 10
#      patience_counter: 5 # the maximum iteration used in adaptive thinning
#      over_sample_rate: 5
#      num_samples_boundary: 5
#      dtime_max: 5
#      num_step_gen: 1

SAHP_gen:
  base_config:
    stage: gen
    backend: torch
    dataset_id: retweet
    runner_id: std_tpp
    model_id: SAHP # model name
    base_dir: './checkpoints/'
  trainer_config:
    batch_size: 256
    max_epoch: 1
  model_config:
    hidden_size: 16
    time_emb_size: 4
    num_layers: 2
    num_heads: 2
    loss_integral_num_sample_per_step: 20
    use_ln: False
    thinning:
      num_seq: 10
      num_sample: 1
      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
      look_ahead_time: 10
      patience_counter: 5 # the maximum iteration used in adaptive thinning
      over_sample_rate: 5
      num_samples_boundary: 5
      dtime_max: 5
      num_step_gen: 10

#THP_train:
#  base_config:
#    stage: train
#    backend: torch
#    dataset_id: taxi
#    runner_id: std_tpp
#    model_id: THP # model name
#    base_dir: './checkpoints/'
#  trainer_config:
#    batch_size: 256
#    max_epoch: 30
#    shuffle: False
#    optimizer: adam
#    learning_rate: 1.e-3
#    valid_freq: 1
#    use_tfb: False
#    metrics: [ 'acc', 'rmse' ]
#    seed: 2019
#    gpu: -1
#  model_config:
#    hidden_size: 32
#    time_emb_size: 16
#    num_layers: 2
#    num_heads: 2
#    mc_num_sample_per_step: 20
#    loss_integral_num_sample_per_step: 20
#    use_ln: False
#    thinning:
#      num_seq: 10
#      num_sample: 1
#      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
#      look_ahead_time: 10
#      patience_counter: 5 # the maximum iteration used in adaptive thinning
#      over_sample_rate: 5
#      num_samples_boundary: 5
#      dtime_max: 5
#      num_step_gen: 1

THP_gen:
  base_config:
    stage: gen
    backend: torch
    dataset_id: retweet
    runner_id: std_tpp
    model_id: THP # model name
    base_dir: './checkpoints/'
  trainer_config:
    batch_size: 256
    max_epoch: 1
  model_config:
    hidden_size: 32
    time_emb_size: 16
    num_layers: 2
    num_heads: 2
    mc_num_sample_per_step: 20
    loss_integral_num_sample_per_step: 20
    use_ln: False
#    pretrained_model_dir: ./checkpoints/2694_4384867712_230603-160544/models/saved_model
    thinning:
      num_seq: 10
      num_sample: 1
      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
      look_ahead_time: 10
      patience_counter: 5 # the maximum iteration used in adaptive thinning
      over_sample_rate: 5
      num_samples_boundary: 5
      dtime_max: 5
      num_step_gen: 10

#AttNHP_train:
#  base_config:
#    stage: train
#    backend: torch
#    dataset_id: taxi
#    runner_id: std_tpp
#    model_id: AttNHP # model name
#    base_dir: './checkpoints/'
#  trainer_config:
#    batch_size: 256
#    max_epoch: 200
#    shuffle: False
#    optimizer: adam
#    learning_rate: 1.e-3
#    valid_freq: 1
#    use_tfb: False
#    metrics: [ 'acc', 'rmse' ]
#    seed: 2019
#    gpu: -1
#  model_config:
#    hidden_size: 16
#    time_emb_size: 4
#    num_layers: 2
#    num_heads: 2
#    loss_integral_num_sample_per_step: 10
#    use_ln: False
#    thinning:
#      num_seq: 2
#      num_sample: 1
#      num_exp: 50 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
#      look_ahead_time: 10
#      patience_counter: 5 # the maximum iteration used in adaptive thinning
#      over_sample_rate: 5
#      num_samples_boundary: 5
#      dtime_max: 5
#      num_step_gen: 1

AttNHP_gen:
  base_config:
    stage: gen
    backend: torch
    dataset_id: retweet
    runner_id: std_tpp
    model_id: AttNHP # model name
    base_dir: './checkpoints/'
  trainer_config:
    batch_size: 256
    max_epoch: 1
  model_config:
    hidden_size: 16
    time_emb_size: 4
    num_layers: 2
    num_heads: 2
    mc_num_sample_per_step: 20
    loss_integral_num_sample_per_step: 20
    use_ln: False
#    pretrained_model_dir: ./checkpoints/6934_4375315840_230603-222826/models/saved_model
    thinning:
      num_seq: 10
      num_sample: 1
      num_exp: 50 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
      look_ahead_time: 10
      patience_counter: 5 # the maximum iteration used in adaptive thinning
      over_sample_rate: 5
      num_samples_boundary: 5
      dtime_max: 5
      num_step_gen: 10```

iLampard commented 10 months ago

Hi,

I used the Retweet data(https://drive.google.com/file/d/1w0fJumlBwgBb2qEvSuNt_VViSmo3BwDH/view?usp=drive_link) and run NHP with this config.

The pipeline works fine. The model indeed runs on GPU by setting the 'gpu' to 0 or 1 in 'trainner_config'. Besides, the learning process runs without any nan.


pipeline_config_id: runner_config

data:
  retweet:
    data_format: pkl
    train_dir: ../data/retweet/train.pkl
    valid_dir: ../data/retweet/dev.pkl
    test_dir: ../data/retweet/test.pkl
    data_specs:
      num_event_types: 3
      pad_token_id: 3
      padding_side: right
      truncation_side: right

NHP_train:
  base_config:
    stage: train
    backend: torch
    dataset_id: taxi
    runner_id: std_tpp
    model_id: NHP # model name
    base_dir: './checkpoints/'
  trainer_config:
    batch_size: 256
    max_epoch: 200
    shuffle: False
    optimizer: adam
    learning_rate: 1.e-3
    valid_freq: 1
    use_tfb: False
    metrics: [ 'acc', 'rmse' ]
    seed: 2019
    gpu: 0.  # have to use 0 or 1 or etc number to indicate the device of gpu
  model_config:
    hidden_size: 64
    loss_integral_num_sample_per_step: 20
    thinning:
      num_seq: 10
      num_sample: 1
      num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
      look_ahead_time: 10
      patience_counter: 5 # the maximum iteration used in adaptive thinning
      over_sample_rate: 5
      num_samples_boundary: 5
      dtime_max: 5
      num_step_gen: 1

2023-11-24 23:47:08,712 - config.py[pid:108665;line:33:build_from_yaml_file] - CRITICAL: Load pipeline config class RunnerConfig 2023-11-24 23:47:09,090 - runner_config.py[pid:108665;line:164:update_config] - CRITICAL: train model NHP using GPU with torch backend 2023-11-24 23:47:09,198 - runner_config.py[pid:108665;line:36:init] - INFO: Save the config to ./checkpoints/108665_140595881875200_231124-234708/NHP_train_output.yaml 2023-11-24 23:47:09,300 - base_runner.py[pid:108665;line:170:save_log] - INFO: Save the log to ./checkpoints/108665_140595881875200_231124-234708/log 2023-11-24 23:47:13,798 - tpp_runner.py[pid:108665;line:60:_init_model] - INFO: Num of model parameters 58240 2023-11-24 23:47:27,626 - base_runner.py[pid:108665;line:92:train] - INFO: Data 'retweet' loaded... 2023-11-24 23:47:27,659 - base_runner.py[pid:108665;line:97:train] - INFO: Start NHP training... 2023-11-24 23:48:05,712 - tpp_runner.py[pid:108665;line:96:_train_model] - INFO: [ Epoch 0 (train) ]: train loglike is -937.6508162202891, num_events is 2156116 2023-11-24 23:48:12,770 - tpp_runner.py[pid:108665;line:107:_train_model] - INFO: [ Epoch 0 (valid) ]: valid loglike is -66.56302014790114, num_events is 213521, acc is 0.04531638574191766, rmse is 16566.363046595558 2023-11-24 23:48:19,178 - tpp_runner.py[pid:108665;line:122:_train_model] - INFO: [ Epoch 0 (test) ]: test loglike is -65.66979996766221, num_events is 216465, acc is 0.04522671101563763, rmse is 16645.20860215622 2023-11-24 23:48:19,178 - tpp_runner.py[pid:108665;line:124:_train_model] - CRITICAL: current best loglike on valid set is -66.5630 (updated at epoch-0), best updated at this epoch 2023-11-24 23:48:56,236 - tpp_runner.py[pid:108665;line:96:_train_model] - INFO: [ Epoch 1 (train) ]: train loglike is -45.61948060627304, num_events is 2156116 2023-11-24 23:49:02,676 - tpp_runner.py[pid:108665;line:107:_train_model] - INFO: [ Epoch 1 (valid) ]: valid loglike is -32.257751286758676, num_events is 213521, acc is 0.04534448602245213, rmse is 16566.344646936057 2023-11-24 23:49:08,715 - tpp_runner.py[pid:108665;line:122:_train_model] - INFO: [ Epoch 1 (test) ]: test loglike is -31.8123746910586, num_events is 216465, acc is 0.04530524565172199, rmse is 16645.191081960686 2023-11-24 23:49:08,716 - tpp_runner.py[pid:108665;line:124:_train_model] - CRITICAL: current best loglike on valid set is -32.2578 (updated at epoch-1), best updated at this epoch 2023-11-24 23:49:46,557 - tpp_runner.py[pid:108665;line:96:_train_model] - INFO: [ Epoch 2 (train) ]: train loglike is -26.536256682516733, num_events is 2156116 2023-11-24 23:49:52,817 - tpp_runner.py[pid:108665;line:107:_train_model] - INFO: [ Epoch 2 (valid) ]: valid loglike is -21.990390728546608, num_events is 213521, acc is 0.04555055474637155, rmse is 16566.32264670633 2023-11-24 23:49:59,328 - tpp_runner.py[pid:108665;line:122:_train_model] - INFO: [ Epoch 2 (test) ]: test loglike is -21.693191595639018, num_events is 216465, acc is 0.04543921650151295, rmse is 16645.169494464597 2023-11-24 23:49:59,328 - tpp_runner.py[pid:108665;line:124:_train_model] - CRITICAL: current best loglike on valid set is -21.9904 (updated at epoch-2), best updated at this epoch

ant-research / EasyTemporalPointProcess

loglike is nan #9