Open LYJ0327 opened 10 months ago
Hi, its due to an update to the torchmetrics package. I'll amend it now.
I'm very sorry, I solved that problem, but I'm having a new problem when I'm testing it:
Traceback (most recent call last):
File "/home/cvt2/bin/dlhpcstarter", line 8, in model_kwargs
are not used by the model: ['special_token_ids', 'mask_token_id'] (note: typos in the generate arguments will also show up in this list)
Testing DataLoader 0: 0%| | 0/148 [00:01<?, ?it/s]
Okay,
Are you attempting:
dlhpcstarter -t mimic_cxr_chen -c config/test_mimic_cxr_chen_cvt2distilgpt2.yaml --stages_module stages --test
?
And what version of transformers are you running?
I'll look into it now.
no,I test on iu_xray with the commend dlhpcstarter -t iu_x_ray_chen -c config/test_iu_x_ray_chen_cvt2distilgpt2.yaml --stages_module stages --test and the version of my transformer is 4.28.1
Okay. I updated a few things and It now works with the latest transformers (4.35.2). Please pull the repo.
Also, you might be interested in https://huggingface.co/aehrc/cxrmate.
I'm sorry, but I still can't test it Traceback (most recent call last): File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch return function(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 794, in _test_impl results = self._run(model, ckpt_path=ckpt_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run results = self._run_stage() ^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1028, in _run_stage return self._evaluation_loop.run() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator return loop_run(self, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 141, in run return self.on_run_end() ^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 253, in on_run_end self._on_evaluation_epoch_end() File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 329, in _on_evaluation_epoch_end call._call_lightning_module_hook(trainer, hook_name) File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook output = fn(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/cvt2gpt/cvt2distilgpt2/cvt2distilgpt2_mimic_cxr_chen.py", line 498, in on_test_epoch_end output = self.test_chexbert_metrics.compute() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/torchmetrics/metric.py", line 607, in wrapped_func value = _squeeze_if_scalar(compute(args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2gpt/cvt2distilgpt2/tools/metrics/chexbert.py", line 65, in compute chexbert = CheXbert( ^^^^^^^^^ File "/home/cvt2gpt/cvt2distilgpt2/tools/chexbert.py", line 13, in init self.tokenizer = BertTokenizer.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2008, in from_pretrained raise EnvironmentError( OSError: Can't load tokenizer for 'checkpoints/bert-base-uncased'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'checkpoints/bert-base-uncased' is the correct path to a directory containing all relevant files for a BertTokenizer tokenizer.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cvt2/bin/dlhpcstarter", line 8, in
and I wonder why I test in iu_xray dataset:dlhpcstarter -t iu_x_ray_chen -c config/test_iu_x_ray_chen_cvt2distilgpt2.yaml --stages_module stages --test but the mimic file appears in the error report: File "/home/lhx/lyj/cvt2gpt/cvt2distilgpt2/cvt2distilgpt2_mimic_cxr_chen.py", line 498, in on_test_epoch_end output = self.test_chexbert_metrics.compute()
Hi, this has been fixed. Please pull again. Please let me know if there are any more issues.
The error was there because the IU X-Ray LightningModule inherits from the MIMIC-CXR LightningModule.
I'm sorry that this issue has not been resolved,I have read his error message and it seems that it needs to load a tokenizer named 'bert-base-uncased' from the bert_path, but this document is not included in your project Traceback (most recent call last): File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch return function(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 794, in _test_impl results = self._run(model, ckpt_path=ckpt_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run results = self._run_stage() ^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1028, in _run_stage return self._evaluation_loop.run() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator return loop_run(self, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 141, in run return self.on_run_end() ^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 253, in on_run_end self._on_evaluation_epoch_end() File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 329, in _on_evaluation_epoch_end call._call_lightning_module_hook(trainer, hook_name) File "/home/cvt2/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook output = fn(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/cvt2gpt/cvt2distilgpt2/cvt2distilgpt2_mimic_cxr_chen.py", line 498, in on_test_epoch_end output = self.test_chexbert_metrics.compute() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cvt2/lib/python3.11/site-packages/torchmetrics/metric.py", line 607, in wrapped_func value = _squeeze_if_scalar(compute(args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lyj/cvt2gpt/cvt2distilgpt2/tools/metrics/chexbert.py", line 65, in compute chexbert = CheXbert( ^^^^^^^^^ File "/home/lyj/cvt2gpt/cvt2distilgpt2/tools/chexbert.py", line 16, in init self.tokenizer = BertTokenizer.from_pretrained(bert_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lyj/cvt2/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2008, in from_pretrained raise EnvironmentError( OSError: Can't load tokenizer for 'bert-base-uncased'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'bert-base-uncased' is the correct path to a directory containing all relevant files for a BertTokenizer tokenizer.
And there is another issue here:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cvt2/bin/dlhpcstarter", line 8, in
Oh no.
Can you please check if you can run the following in your environment:
BertTokenizer.from_pretrained('bert-base-uncased')
If you cannot, you likely have issues accessing Hugging Face hub/have a firewall preventing this.
Hi,I'm so sorry for disturbing you again...
But whether I test or train, it's always reporting errors. I want to know if there's a problem with the dlhpcstarter:
Traceback (most recent call last):
File "/home/cvt2/bin/dlhpcstarter", line 8, in
Hi,
What is your current working directory and where is the absolute path of stages.py?
Oh, I made a stupid mistake. I understand what you mean and have already solved this problem. Thank you for your patience!
That's okay 😄
I'm very sorry to bother you, but when I experiment with the mimic dataset, it reports the following error:
(cvt2) (base) lhx@test:~/lyj/cvt2gpt/cvt2distilgpt2$ dlhpcstarter -t mimic_cxr_chen -c config/test_mimic_cxr_chen_cvt2distilgpt2.yaml --stages_module stages --test
args: {'task': 'mimic_cxr_chen', 'config': 'config/test_mimic_cxr_chen_cvt2distilgpt2', 'exp_dir': 'experiments', 'work_dir': '/home/lhx/lyj/cvt2gpt/cvt2distilgpt2', 'dataset_dir': '/data/lhx/lyj/data', 'ckpt_zoo_dir': 'checkpoints', 'definition': 'CvT2DistilGPT2', 'module': 'cvt2distilgpt2_mimic_cxr_chen', 'stages_definition': 'stages', 'stages_module': 'stages', 'train': None, 'trial': 0, 'resume_last': True, 'resume_epoch': None, 'resume_ckpt_path': None, 'warm_start_ckpt_path': None, 'monitor': 'val_chen_cider', 'monitor_mode': 'max', 'test': True, 'test_epoch': None, 'test_ckpt_path': 'checkpoints/mimic_cxr_jpg_chen/cvt_21_to_distilgpt2/epoch=8-val_chen_cider=0.425092.ckpt', 'fast_dev_run': None, 'num_workers': 5, 'devices': 1, 'num_nodes': 1, 'memory': None, 'time_limit': None, 'submit': None, 'qos': None, 'begin': None, 'slurm_cmd_path': None, 'email': None, 'cuda_visible_devices': None, 'venv_path': None, 'config_file_name': 'config/test_mimic_cxr_chen_cvt2distilgpt2.yaml', 'config_name': 'test_mimic_cxr_chen_cvt2distilgpt2', 'config_dir': '/home/lhx/lyj/cvt2gpt/cvt2distilgpt2/config', 'config_full_path': '/home/lhx/lyj/cvt2gpt/cvt2distilgpt2/config/test_mimic_cxr_chen_cvt2distilgpt2.yaml', 'strategy': 'ddp_find_unused_parameters_true', 'encoder_lr': 5e-05, 'decoder_lr': 0.0005, 'mbatch_size': 4, 'every_n_epochs': 1, 'precision': 16, 'decoder_max_len': 128, 'num_test_beams': 4, 'enable_progress_bar': True, 'weights_summary': 'full', 'early_stopping': True, 'patience': 10, 'min_delta': 0.0001, 'deterministic': False, 'exp_dir_trial': 'experiments/mimic_cxr_chen/test_mimic_cxr_chen_cvt2distilgpt2/trial_0'}
Seed set to 0
Traceback (most recent call last):
File "/home/lhx/lyj/cvt2/bin/dlhpcstarter", line 8, in
Hi,
The class name in 'cvt2distilgpt2_mimic_cxr_chen' is 'CvT2DistilGPT2MIMICXRChen' rather than 'CvT2DistilGPT2'.
I think the error is here: https://github.com/aehrc/cvt2distilgpt2/blob/main/config/test_mimic_cxr_chen_cvt2distilgpt2.yaml, I will update this, sorry about that.
Hello, thank you for your previous help and I am very sorry to bother you again. I used the checkpoint you provided and BLEU4 reached 0.189, this is great! But when I use your training command dlhpcstarter -t mimic_cxr -c config/train_mimic_cxr_chen_cvt2distilgpt2.yaml --stages_module stages --train --test to train the network from scratch, the BLEU4 is only 0.046 at epoch 30, 0.05467 at epoch 50, and 0.0453 at epoch 100. The difference between them is very big, 10 percentage points. Is it because the card I am using is 4090 or is it the number of epochs I have set or some other issue? I would be very happy if you could reply to me and thank you for your patience.
Hi,
That is concerning. I am training with the same config (dlhpcstarter -t mimic_cxr -c config/train_mimic_cxr_chen_cvt2distilgpt2.yaml --stages_module stages --train --test
) to see if the issue occurs for me. After two epochs, the validation scores are as follows:
epoch | step | train_loss_step | val_ce_precision_example | val_chen_rouge | val_chen_cider | val_ce_f1_macro | val_ce_num_examples | val_ce_recall_macro | val_chen_bleu_2 | val_chen_bleu_1 | val_ce_recall_example | val_chen_bleu_3 | val_ce_precision_micro | val_ce_f1_micro | val_ce_recall_micro | val_chen_bleu_4 | val_ce_f1_example | val_chen_num_examples | val_ce_precision_macro | train_loss_epoch |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 16924 | 0.40693270735524256 | 0.31695191260859684 | 0.2920809223790337 | 0.20219941561024884 | 2130.0 | 0.20641742878825722 | 0.22258831560611725 | 0.3392559885978699 | 0.3729107981220657 | 0.15715651214122772 | 0.454778469425119 | 0.414069011501917 | 0.38004895960832313 | 0.1176675483584404 | 0.3744735077129443 | 2130.0 | 0.2957008008496724 | ||
1 | 33849 | 0.42745696400625977 | 0.32013704564588763 | 0.3445314092814959 | 0.24549816758435877 | 2130.0 | 0.24682623539299464 | 0.23871149122714996 | 0.3632817566394806 | 0.3982863849765258 | 0.16974632441997528 | 0.46675805346127486 | 0.44034917555771097 | 0.4167686658506732 | 0.12679457664489746 | 0.396732617929801 | 2130.0 | 0.3056663869771598 |
Which seems normal. I will keep training it and update you with the results.
In train_iu_x_ray_chen_cvt2distilgpt2.yaml, I had set early stopping, so no need for setting a max number of epochs:
early_stopping: True
patience: 10
min_delta: 1e-4
It did not require that many epochs, for example, if you look here: https://data.csiro.au/collection/csiro%3A53728v5 , epoch 8 was the best epoch.
I doubt the card you are using is causing this much of a difference.
Could you please comment with the output of pip list? I am trying to think of what other issues there could be in the meantime.
Here are the validation scores after 9 epochs:
epoch | step | train_loss_step | val_ce_precision_example | val_chen_rouge | val_chen_cider | val_ce_f1_macro | val_ce_num_examples | val_ce_recall_macro | val_chen_bleu_2 | val_chen_bleu_1 | val_ce_recall_example | val_chen_bleu_3 | val_ce_precision_micro | val_ce_f1_micro | val_ce_recall_micro | val_chen_bleu_4 | val_ce_f1_example | val_chen_num_examples | val_ce_precision_macro | train_loss_epoch |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 16924 | 0.40693270735524256 | 0.31695191260859684 | 0.2920809223790337 | 0.20219941561024884 | 2130.0 | 0.20641742878825722 | 0.22258831560611725 | 0.3392559885978699 | 0.3729107981220657 | 0.15715651214122772 | 0.454778469425119 | 0.414069011501917 | 0.38004895960832313 | 0.1176675483584404 | 0.3744735077129443 | 2130.0 | 0.2957008008496724 | ||
1 | 33849 | 0.42745696400625977 | 0.32013704564588763 | 0.3445314092814959 | 0.24549816758435877 | 2130.0 | 0.24682623539299464 | 0.23871149122714996 | 0.3632817566394806 | 0.3982863849765258 | 0.16974632441997528 | 0.46675805346127486 | 0.44034917555771097 | 0.4167686658506732 | 0.12679457664489746 | 0.396732617929801 | 2130.0 | 0.3056663869771598 | ||
2 | 50774 | 0.4493818466353678 | 0.32215001117525616 | 0.3491774795545655 | 0.24494992327325998 | 2130.0 | 0.2451831928520513 | 0.23824720084667206 | 0.360444575548172 | 0.40607198748043816 | 0.1692574918270111 | 0.4796195652173913 | 0.45460399227301995 | 0.4320685434516524 | 0.12665951251983643 | 0.4104650122959982 | 2130.0 | 0.33146890807093127 | ||
3 | 67699 | 0.419679186228482 | 0.31870183849585365 | 0.3268155641993095 | 0.24201441861553305 | 2130.0 | 0.23874382163185287 | 0.23246432840824127 | 0.3520020842552185 | 0.38953834115805946 | 0.16601967811584473 | 0.5027644673792849 | 0.4561110182243772 | 0.4173806609547124 | 0.12430074065923691 | 0.3892860868917207 | 2130.0 | 0.30884301315553314 | ||
4 | 84624 | 0.421471048513302 | 0.320114059908859 | 0.3533501690062039 | 0.26324989177251074 | 2130.0 | 0.25269686966174104 | 0.23336923122406006 | 0.3527657985687256 | 0.3964397496087637 | 0.16647367179393768 | 0.48460176991150444 | 0.44936812735926474 | 0.41891064871481026 | 0.12536273896694183 | 0.3942397482538328 | 2130.0 | 0.3413789053425755 | ||
5 | 101549 | 0.44420523138832996 | 0.32870873072950774 | 0.3749406450696774 | 0.28741378950783564 | 2130.0 | 0.28123585476076024 | 0.25293877720832825 | 0.3823215067386627 | 0.42827856025039124 | 0.18067187070846558 | 0.5026737967914439 | 0.4805111821086262 | 0.4602203182374541 | 0.13547103106975555 | 0.4188216504413687 | 2130.0 | 0.34597945551478443 | ||
6 | 118474 | 0.4521987480438185 | 0.3252632081509904 | 0.3583214345964929 | 0.27354281517517703 | 2130.0 | 0.2692128588238088 | 0.24576494097709656 | 0.3704437017440796 | 0.4243348982785602 | 0.17543305456638336 | 0.4955902306648575 | 0.4700772200772201 | 0.447062423500612 | 0.1316041350364685 | 0.4206494522691706 | 2130.0 | 0.3183531138289372 | ||
7 | 135399 | 0.43258215962441315 | 0.3088185459463964 | 0.3243224346348287 | 0.27673346937544413 | 2130.0 | 0.2682169390229497 | 0.22527024149894714 | 0.34522631764411926 | 0.41045383411580594 | 0.15802665054798126 | 0.4795613160518445 | 0.45977377728214114 | 0.4415544675642595 | 0.1160745918750763 | 0.40617706237424545 | 2130.0 | 0.35257348143859435 | ||
8 | 152324 | 0.43335680751173705 | 0.3199401416458053 | 0.3724532987339018 | 0.2943421836128386 | 2130.0 | 0.2802859326143582 | 0.2422061413526535 | 0.37223392724990845 | 0.4196009389671361 | 0.17064839601516724 | 0.4915364583333333 | 0.47634069400630913 | 0.4620563035495716 | 0.12724152207374573 | 0.40869103509948584 | 2130.0 | 0.36754727023553097 |
@LYJ0327, how do these validation scores compare to yours?
I'm very sorry for my late reply, I did both training and testing of your code on a 4090 gpu and here are my results, which still differ from yours. ps, I used the pth file you provided for my test experiments on both datasets
Hi LYJ0327,
Could you please share your validation metrics? E.g., experiments/mimic_cxr/train_mimic_cxr_chen_cvt2distilgpt2/trial_0/lightning_logs/version_6/metrics.csv
.
I'd love to share the relevant metric with you, but I don't seem to have generated the relevant file in my folder.
try version_0? Usually one is created for .csv files and the following one is created for tensorboard
Ok, I downloaded the two csv files as shown below. mimic_metrics.csv iu_xray_metrics.csv Secondly, I just tested the effect of the pth file you gave me on A100 and the result is shown in the picture. However when I tried to train your code on A100, he reported the following error: Do you know what is the cause of this please? I suspect that it is caused by my pycocotools not being installed according to the version specified in your REQUIREMENT, when I try to install pycocotools==2.0.4, the system reports the following error: So I have installed pycocotools version 2.0.7. do you know if this is the reason why I can't run the train code or is it something else?
Hi LYJ0327,
Your validation and test scores for MIMIC-CXR seem as expected, differences caused by seed, etc. There is a high variability in the score over many training runs, e.g., see the variance in BLEU-4 scores in Figure 7 for the training runs of each model: https://doi.org/10.1016/j.artmed.2023.102633
Here are the results from the new model I trained a week ago from a few comments ago (with epoch 8) (which are similar to your test results from mimic_metrics.csv
:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ test_ce_f1_example │ 0.3509851720193866 │ │ test_ce_f1_macro │ 0.25575093165688123 │ │ test_ce_f1_micro │ 0.42572590826202833 │ │ test_ce_num_examples │ 3858.0 │ │ test_ce_precision_example │ 0.41046742699153277 │ │ test_ce_precision_macro │ 0.33392767167746434 │ │ test_ce_precision_micro │ 0.484669434685404 │ │ test_ce_recall_example │ 0.3443025006788615 │ │ test_ce_recall_macro │ 0.24629913305384998 │ │ test_ce_recall_micro │ 0.3795647823911956 │ │ test_chen_bleu_1 │ 0.3907630145549774 │ │ test_chen_bleu_2 │ 0.24406647682189941 │ │ test_chen_bleu_3 │ 0.1676843911409378 │ │ test_chen_bleu_4 │ 0.12317755818367004 │ │ test_chen_cider │ 0.34302342164748734 │ │ test_chen_meteor │ 0.15075385570526123 │ │ test_chen_num_examples │ 3858.0 │ │ test_chen_rouge │ 0.28515253454563666 │ └───────────────────────────┴───────────────────────────┘
pycocotools
requires gcc for installation.
Ok, thanks, I got the results of your experiment. But my other question is, I already have gcc, why I still can't install pycocotools version 2.0.7.
Hi LYJ0327,
Unfortunately, pycocotools is not my package, maybe the authors of this package will be better able to help you with this.
Sorry I couldn't help you better with this.
Hi, thank you very, very much for your patience and help. You're so kind! I've been able to run through the whole code, but I'm having some problems understanding your code, can you please give me a contact information for WeChat or any other software? I would like to ask you some details about your code implementation. I would be very, very grateful if you could be patient and help me!
Hi LYJ0327,
Which part is hard to understand?
hi, thanks for the reply. I am having trouble understanding how the network encodes the token as embedding when doing text decoder? And what part of the code should I understand if I want to use embedding directly for decoding?
Hi,
Hugging Face tokenises is used for tokenisation: https://huggingface.co/learn/nlp-course/en/chapter2/4
The Hugging Face decoder converts the tokens into embeddings using the torch.nn.Embedding class: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
Embeddings can be given as input to a Hugging Face decoder via: https://huggingface.co/docs/transformers/en/model_doc/bert#transformers.BertLMHeadModel.forward.inputs_embeds
Hope that helps
Thank you for the help you provided, I know you used these functions to do the corresponding work, but I am more interested in knowing which part of the source code that you provided use these tools of huggingface; secondly, may I ask how the ce metrics in your paper are calculated? Could you provide the corresponding code for calculating the ce metrics?
thanks!
Hi, Hello, I'm very sorry to bother you, but I would like to ask why my program reports the following error whether I run test or train: Traceback (most recent call last): File "/home/cvt2/bin/dlhpcstarter", line 8, in
sys.exit(main())
^^^^^^
File "/home/cvt2/lib/python3.11/site-packages/dlhpcstarter/main.py", line 54, in main
submit(args=args, stages_fnc=stages_fnc)
File "/home/cvt2/lib/python3.11/site-packages/dlhpcstarter/main.py", line 69, in submit
stages_fnc(args)
File "/home/cvt2distilgpt2/stages.py", line 68, in stages
model = TaskModel(**vars(args))
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cvt2distilgpt2/cvt2distilgpt2_iu_x_ray_chen.py", line 66, in init
self.val_coco_metrics = COCOCaptionMetrics(metrics=["bleu", "cider", "rouge"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cvt2distilgpt2/tools/metrics/coco.py", line 33, in init
super().init(dist_sync_on_step=dist_sync_on_step, compute_onstep=False)
File "/home/cvt2/lib/python3.11/site-packages/torchmetrics/metric.py", line 146, in init
raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs)}")
ValueError: Unexpected keyword arguments:
compute_on_step
I would be very grateful if you could reply me with a solution to my problem!