AssertionError when trying to change the batch size in GridSearcher

alainloisel commented 1 year ago

Hi, I tried to follow the Python finetune instructions provided on the README. With a batch of 64, I naturally obtain an OOM on my config. However when I tried to change the batch size to 8 as below I had an Assertion Error :

from lmqg import GridSearcher trainer = GridSearcher( checkpoint_dir='tmp_ckpt', dataset_path='lmqg/qg_squad', model='t5-small', epoch=15, epoch_partial=5, batch=8, n_max_config=5, gradient_accumulation_steps=[2, 4], lr=[1e-04, 5e-04, 1e-03], label_smoothing=[0, 0.15] ) trainer.run()

AssertionError Traceback (most recent call last) in 12 label_smoothing=[0, 0.15] 13 ) ---> 14 trainer.run()

1 frames /usr/local/lib/python3.8/dist-packages/lmqg/grid_searcher.py in initialize_searcher(self) 87 tmp_v = [tmp[k] for k in sorted(tmp.keys())] 88 static_tmp_v = [self.static_config[k] for k in sorted(tmp.keys())] ---> 89 assert tmp_v == static_tmp_v, f'{str(tmp_v)}\n not matched \n{str(static_tmp_v)}' 90 path_to_d_config = pj(self.checkpoint_dir, 'config_dynamic.json') 91 if os.path.exists(path_to_d_config):

AssertionError: [64, 'default', 'lmqg/qg_squad', 15, False, 'paragraph_answer', 512, 32, 't5-small', 'question', 'qg'] not matched [8, 'default', 'lmqg/qg_squad', 15, False, 'paragraph_answer', 512, 32, 't5-small', 'question', 'qg']

asahi417 commented 1 year ago

I have to update the error message to be more explicit, but this error means that you already have a working directly at the passed checkpoint_dir, which has a different config than what you specified this time. It won't overwrite the working directly with the new configuration, so you may want to delete the checkpoint_dir and then it should work.

asahi417 commented 1 year ago

I would add an option to overwrite the working directly with the new parameter in the future. Thanks for your feedback!

alainloisel commented 1 year ago

Thank you. I understand the process now and it works well Now I am now stuck on the eval process : with this error

Traceback (most recent call last): File "/usr/local/bin/lmqg-eval", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/lmqg/lmqg_cl/model_evaluation.py", line 49, in main metric = evaluate( File "/usr/local/lib/python3.8/dist-packages/lmqg/automatic_evaluation.py", line 180, in evaluate reference_files = get_reference_files(dataset_path, dataset_name) File "/usr/local/lib/python3.8/dist-packages/lmqg/data.py", line 53, in get_reference_files assert len(f.read().split('\n')) > 20, f"invalid file {ref_path}" File "/usr/lib/python3.8/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 29962: ordinal not in range(128)

However it seems to be linked to Google colab since I don't get this error on my local machine.**

asahi417 / lm-question-generation

AssertionError when trying to change the batch size in GridSearcher #6