Closed tunajaw closed 5 months ago
Hi, after checking the code, The code can run when modified with the following:
-pos_alpha
parser argumentMain.valid_epoch()
, ignore the calculations after CIF decoder when enc_out.shape[1] < 2
Mian.main()
, change the code from if opt.transfer_learning != '':
to if opt.transfer_learning:
since the type of opt.transfer_learning
is booleanI'm not sure if the modifications would make something wrong; if can, could you have a quick glance at it? Thanks a lot!
Hi @tunajaw.
Thank you for debugging our code. I have fixed the bugs and updated the repo. Could you explain the second bug in valid_epoch()
?
Did you manage to train on P12 dataset? This works for me without any problem. Let me know if it works for you.
python Main.py -batch_size 128 -lr 0.01 -weight_decay 0.1 -w_pos_label 0.5 -w_sample_label 100 -w_time 1 -w_event 1 -data /mlodata1/hokarami/tedam/p12/ -setting raindrop -split 0 -demo -data_label multilabel -epoch 50 -per 100 -ES_pat 100 -wandb -wandb_project TEEDAM_supervised -event_enc 0 -state -mod none -next_mark 1 -mark_detach 1 -sample_label 1 -user_prefix [H70--DA__base-concat] -time_enc concat -wandb_tag RD75
This works for me without any problem. Let me know if it works for you.
Kind Regards,
Hi @hojjatkarami ,
It would encounter the below bug when it didn't ignore the calculations after the CIF decoder when enc_out.shape[1] < 2
:
- (Testing) : 0%| | 0/299 [00:00<?, ?it/s][A
- (Testing) : 51%|█████ | 152/299 [00:02<00:01, 75.95it/s][A
[A
Traceback (most recent call last):
File "/home/tunajaw/TEE4EHR/Main.py", line 1510, in <module>
main()
File "/home/tunajaw/TEE4EHR/Main.py", line 1484, in main
train(model, opt.trainloader, opt.validloader,
File "/home/tunajaw/TEE4EHR/Main.py", line 725, in train
valid_event, valid_type, valid_time, dict_metrics_valid = valid_epoch(
File "/home/tunajaw/TEE4EHR/Main.py", line 443, in valid_epoch
log_sum, integral_ = model.event_decoder(
File "/home/tunajaw/anaconda3/envs/TEE4EHR/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/tunajaw/anaconda3/envs/TEE4EHR/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tunajaw/TEE4EHR/transformer/Modules.py", line 248, in forward
if p.max() > 0.999:
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
wandb: - 0.022 MB of 0.022 MB uploaded
And here is my bash script to run the code (based on the Quick Start part of the readme file):
data_path=./dataset/p12/
wandb_project=TEEDAM_supervised
wandb_tag=RD70
user_prefix=original
supervised_tag=RD75
python Main.py -data $data_path -setting raindrop -split 0 -demo -data_label multilabel -wandb -wandb_project $wandb_project -event_enc 1 -state -mod ml -next_mark 1 -mark_detach 1 -sample_label 1 -user_prefix $user_prefix -time_enc concat -wandb_tag $wandb_tag > run.log 2>&1
While the training and testing flow of unsupervised and supervised tasks is OK with no bugs, actually I'm a little bit confused about the usage of the Quick Start part.
Thank you @tunajaw for being attentive. I have updated the code.
I still get the exact same error as in https://github.com/esl-epfl/TEE4EHR/issues/1#issuecomment-2096241268 even though I'm on eac26d73ba00dc96ab4cae7714e36fadac8d3778 (which I guess should have fixed some errors)
My command:
python Main.py -data ./data/p19/ -setting raindrop -split 0 -demo -data_label multilabel -wandb -wandb_project TEEDAM_supervised -event_enc 1 -state -mod ml -next_mark 1 -mark_detach 1 -sample_label 1 -user_prefix [H70--TEDA__pp_ml-concat] -time_enc concat -wandb_tag RD75
Error:
Traceback (most recent call last):
File "/home/user/projects/tee4ehr/Main.py", line 1493, in <module>
main()
File "/home/user/projects/tee4ehr/Main.py", line 1467, in main
train(model, opt.trainloader, opt.validloader,
File "/home/user/projects/tee4ehr/Main.py", line 691, in train
train_event, train_type, train_time, dict_metrics_train = train_epoch(
File "/home/user/projects/tee4ehr/Main.py", line 297, in train_epoch
log_sum, integral_ = model.event_decoder(
File "/home/user/projects/tee4ehr/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/projects/tee4ehr/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/projects/tee4ehr/transformer/Modules.py", line 250, in forward
if torch.max(p) > 0.999:
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
And here is the code fragment causing the error: https://github.com/esl-epfl/TEE4EHR/blob/eac26d73ba00dc96ab4cae7714e36fadac8d3778/transformer/Modules.py#L243-L259
To be honest, I don't understand the meaning of these operations, but under certain conditions p
gets size torch.Size([4, 0, 25])
. This happens when the size of the second dimension of seq_times
and seq_types
is 1, and thus dt_seq = (seq_times[:, 1:] - seq_times[:, :-1]) * non_pad_mask[:, 1:]
gets empty tensor which then causes the error with torch.max
.
So what does it mean when seq_times
(aka event_time
in train_epoch
) has second dimension size equal to 1
? Shouldn't such values be filtered out from the training dataset?
Hi, I'm currently trying to train the p12-rainbow dataset by following the instructions in the readme file:
with my own wandb account. However, I got an error:
After checking, I found a run at my TEE4EHR_supervised project while TEE4EHR_unsupervised doesn't.
Another try is changing
-wandb_project
to TEEDAM_unspervised. This time, there was indeed a run at TEEDAM_unspervised, but I am still receiving the same error. Results from the generatedproject.csv
showed that the dataframe was empty, that is,(checking
project.csv
when keeping the original setting, the dataframe remained the same.)Do you know how I can solve this issue? Thanks a lot!
Some modifications so far in
Main.py
:parser.add_argument('-pos_alpha', type=float, default=1.0)
or I'll get an errorwandb.Api.runs()