Closed Labyrintbs closed 1 year ago
Hi, congrats on training your model! I think this is the same problem as https://github.com/JonasGeiping/cramming/issues/24? Yeah this should be fixed in the new version.
Hi, congrats on training your model! I think this is the same problem as #24? Yeah this should be fixed in the new version.
Oh thanks! Exactly the same problem, i didn't notice this issue before ;)
Great, closing this for now.
I followed instructions to replicate the Last1.13release using the corrseponding version's README.md, i.e.
The pretraining worked fine except for loss explosion using the default lr_scheduler budget-triangle2 in bert-o3.yaml, so i just changed to budget-one-cycle according to the report of schedulers on the paper, since these two have similar behaviors for pretraining loss decay. Anyway the pretraining finnaly achieved a loss of 1.8282 in a RTX2080Ti for a single day, equivalent to the result reported in paper. But for evaluation, problem came out for the downstream tasks diffrent of 2 classifications, like 3 classification for MNLI and 1 classification for STSB. For MNLI, errors happened like
RuntimeError: CUDA error: device-side assert triggered
orIndexError: Target 2 is out of bounds
if putting the model on CPU and to looking for further infos. For STSB, errors happened likeloss evaluation error happens, Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 2]))
I checked the code carefully, and found the problem comes one line from the 'class ScriptableLMForSequenceClassification(PreTrainedModel)'
(https://github.com/JonasGeiping/cramming/blob/4a5e3008a5ec05ed68f9d096e4875f8dddadcf81/cramming/architectures/scriptable_bert.py#L229)
which is initialized in downstream task function (https://github.com/JonasGeiping/cramming/blob/4a5e3008a5ec05ed68f9d096e4875f8dddadcf81/cramming/architectures/scriptable_bert.py#L24C1-L35C17)
All the modification here work and I realized the args passed to
ScriptableLMForSequenceClassification
worked asarch
attribute ofcrammedBertConfig
class inherited from transformers lib's basic classPretrainedConfig
.However, this line of code
config.arch['num_labels'] = config.num_labels
just rewrites the final classification number to 2 since the defaultPretrainedConfig
sets its attributenum_labels
to 2.I commented this line of code and it seems work fine.
As this released version is fairly old to the newest Torch2.1, I think it's meaningless to open a pr so I leave a issue here in case someone encounters the same problem of me :)