andimarafioti / florence2-finetuning

Quick exploration into fine tuning florence 2
MIT License
247 stars 22 forks source link

Problem with tokeniser #14

Closed louithy closed 1 month ago

louithy commented 1 month ago

Hi,

I get the following error when running the script (I tried all the models)

    labels = processor.tokenizer(
             ^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/florence/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2883, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/florence/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2969, in _call_one
    return self.batch_encode_plus(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/florence/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3160, in batch_encode_plus
    return self._batch_encode_plus(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/florence/lib/python3.12/site-packages/transformers/models/bart/tokenization_bart_fast.py", line 231, in _batch_encode_plus
    return super()._batch_encode_plus(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/florence/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 511, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

Any idea where it comes from ?

louithy commented 1 month ago

It's probably because of last commit https://github.com/andimarafioti/florence2-finetuning/commit/11e875d94d24582dafa883849f23b4809e1caa40 since when checking out to previous commit I don't get this error anymore But I get this one

 return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: GET was unable to find an engine to execute this computation
louithy commented 1 month ago

Ok solved with issue https://github.com/andimarafioti/florence2-finetuning/issues/2

louithy commented 1 month ago

But there is still and issue with the dataset when checking out to the last commit :-)

thariqkhalid commented 1 month ago

same with me

andimarafioti commented 1 month ago

What is the issue with the dataset? The other seems to be external to this code.

andimarafioti commented 1 month ago

Oh, I get the issue with the dataset, I'm targeting a version of docmatix that's only private. But before I was targeting a dataset that was local on my computer, so not much better :/. I'll work on a quick fix

andimarafioti commented 1 month ago

Hey, sorry about that, I messed up 😅 Here's the fix: https://github.com/andimarafioti/florence2-finetuning/commit/262258ef647b83c08e0d574b2cce5f4f4b4c9e95