scripts for phenotype classification

anothersin commented 1 year ago

Hi thanks for the great package!

We want to reproduce the results of the paper, but we didn't find any script for phenotype classification, can we train the results of the paper by simply modifying the script of IHM?

python -W ignore main.py  --num_train_epochs 6  --modeltype 'TS_Text' \
                --kernel_size 1 --train_batch_size 2 --eval_batch_size 8   --seed 42 \
                --gradient_accumulation_steps 16  --num_update_bert_epochs 2 --bertcount 3 \
                --ts_learning_rate  0.0004 --txt_learning_rate 0.00002 \
                --notes_order 'Last' --num_of_notes 5 --max_length 1024 --layers 3\
                --output_dir "run/TS_Text/" --embed_dim 128 \
                --model_name "bioLongformer"\
                --task 'pheno'\
                --file_path 'Data/pheno' \
                --num_labels 25 \
                --num_heads 8\
                --irregular_learn_emb_text\
                --embed_time 64\
                --tt_max 48\
                --TS_mixup\
                --mixup_level $mixup_level\
                --fp16 \
                --irregular_learn_emb_text \
                --irregular_learn_emb_ts \
                --reg_ts

We appreciate any help you may be able to give.

XZhang97666 commented 1 year ago

We set the --tt_max 24, because the 'pheno' uilizes the first 24 hour data for early-stage prediction.

On Tue, Sep 12, 2023 at 12:46 AM mr_qone @.***> wrote:

Hi thanks for the great package!

We want to reproduce the results of the paper, but we didn't find any script for phenotype classification, can we train the results of the paper by simply modifying the script of IHM?

python -W ignore main.py --num_train_epochs 6 --modeltype 'TS_Text' \ --kernel_size 1 --train_batch_size 2 --eval_batch_size 8 --seed 42 \ --gradient_accumulation_steps 16 --num_update_bert_epochs 2 --bertcount 3 \ --ts_learning_rate 0.0004 --txt_learning_rate 0.00002 \ --notes_order 'Last' --num_of_notes 5 --max_length 1024 --layers 3\ --output_dir "run/TS_Text/" --embed_dim 128 \ --model_name "bioLongformer"\ --task 'pheno'\ --file_path 'Data/pheno' \ --num_labels 25 \ --num_heads 8\ --irregular_learn_emb_text\ --embed_time 64\ --tt_max 48\ --TS_mixup\ --mixup_level $mixup_level\ --fp16 \ --irregular_learn_emb_text \ --irregular_learn_emb_ts \ --reg_ts

We appreciate any help you may be able to give.

— Reply to this email directly, view it on GitHub https://github.com/XZhang97666/MultimodalMIMIC/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVYQBIG327QTJQEZ7JVYQLLX2AHPHANCNFSM6AAAAAA4UNZCBU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

anothersin commented 1 year ago

We set the --tt_max 24, because the 'pheno' uilizes the first 24 hour data for early-stage prediction. … On Tue, Sep 12, 2023 at 12:46 AM mr_qone @.> wrote: Hi thanks for the great package! We want to reproduce the results of the paper, but we didn't find any script for phenotype classification, can we train the results of the paper by simply modifying the script of IHM? python -W ignore main.py --num_train_epochs 6 --modeltype 'TS_Text' \ --kernel_size 1 --train_batch_size 2 --eval_batch_size 8 --seed 42 \ --gradient_accumulation_steps 16 --num_update_bert_epochs 2 --bertcount 3 \ --ts_learning_rate 0.0004 --txt_learning_rate 0.00002 \ --notes_order 'Last' --num_of_notes 5 --max_length 1024 --layers 3\ --output_dir "run/TS_Text/" --embed_dim 128 \ --model_name "bioLongformer"\ --task 'pheno'\ --file_path 'Data/pheno' \ --num_labels 25 \ --num_heads 8\ --irregular_learn_emb_text\ --embed_time 64\ --tt_max 48\ --TS_mixup\ --mixup_level $mixup_level\ --fp16 \ --irregular_learn_emb_text \ --irregular_learn_emb_ts \ --reg_ts We appreciate any help you may be able to give. — Reply to this email directly, view it on GitHub <#2>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVYQBIG327QTJQEZ7JVYQLLX2AHPHANCNFSM6AAAAAA4UNZCBU . You are receiving this because you are subscribed to this thread.Message ID: @.>

Hi, thanks so much for the quick reply!

I get a data loading error at the first epoch of training.

Traceback (most recent call last):
  File "main.py", line 101, in <module>
    main()
  File "main.py", line 90, in main
    trainer_irg(model=model,args=args,accelerator=accelerator,train_dataloader=train_dataload                                er,\
  File "/media/qys/Medical/MultimodalMIMIC/train.py", line 74, in trainer_irg
    for step, batch in tqdm(enumerate(train_dataloader)):
  File "/home/qys/anaconda3/envs/MulEHR/lib/python3.8/site-packages/tqdm/std.py", line 1180,                                 in __iter__
    for obj in iterable:
  File "/home/qys/anaconda3/envs/MulEHR/lib/python3.8/site-packages/accelerate/data_loader.py                                ", line 303, in __iter__
    for batch in super().__iter__():
  File "/home/qys/anaconda3/envs/MulEHR/lib/python3.8/site-packages/torch/utils/data/dataload                                er.py", line 435, in __next__
    data = self._next_data()
  File "/home/qys/anaconda3/envs/MulEHR/lib/python3.8/site-packages/torch/utils/data/dataload                                er.py", line 475, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/qys/anaconda3/envs/MulEHR/lib/python3.8/site-packages/torch/utils/data/_utils/f                                etch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/media/qys/Medical/MultimodalMIMIC/data.py", line 180, in TextTSIrgcollate_fn
    ts_input_sequences=pad_sequence([example['ts'] for example in batch],batch_first=True,pad                                ding_value=0 )
  File "/home/qys/anaconda3/envs/MulEHR/lib/python3.8/site-packages/torch/nn/utils/rnn.py", l                                ine 358, in pad_sequence
    max_size = sequences[0].size()
IndexError: list index out of range

I confirmed that when I reprocessed the MIMIC data, I set the period_length to 24. The PhenotypingReader function in from mimic3benchmark.readers.py doesn't have an interface to transmit the period_length, and the way I've done it here is to not transmit the parameter, not sure if that's the cause.

        train_reader = PhenotypingReader(dataset_dir=os.path.join(args.data, 'train'),
                                               listfile=os.path.join(args.data, 'train', 'listfile.csv'),
                                                period_length=args.period_length)

If I need to transmit this parameter, how to change PhenotypingReader.

flying-lby commented 12 months ago

Hello, I'm sorry to disturb you. May I ask how you addressed the issue of missing clinical note data? I've encountered a situation where some entries have missing ‘text_data’ attributes during the process of reproducing the results of a research paper.

XZhang97666 / MultimodalMIMIC

scripts for phenotype classification #2