This is the TrOCR model training and inference on handwritten tibetan text
There are multiple versions of the TrOCR pipeline. One that might be the most important one is the accelerate version which can be found in the huggingface-trainer branch. Using accelerate, you can run the training in multiple scenarios. Whether you are training on a single GPU system or a multi-GPU system, the accelerate branch can optimize the training based on the system. https://youtu.be/t8Krzu-nSeY ( a good video on accelerate )
if you are using accelerate, only use trocr/Fine_tune_TrOCR.ipynb file, it's configured so that you can run multi-gpu in a notebook (something that vasi.ai likes)
poetry env use path_to_python
. in Windows it would be poetry env use C:\Users\<username>\AppData\Programs\Python\Python310\python.exe
poetry install
Use the Fine_Tune.ipynb as the training notebook Use the Inference.ipynb as the inference notebook
inside the tibetan-dataset/ folder have two necessary things labels.csv => which is a two column csv file that maps image name to it's label (text) train/ folder which contains all images for the training
Make sure that in Fine_Tune.ipynb, you are using the correct names for the labels.csv file
When using poetry to download pytorch, there is a massive download.
One way to get around it is to remove pytorch from poetry management and run pip from poetry instead of using poetry add
So something like removing anything related to pytorch from pyproject.toml and then running poetry run pip install torch
Although, maybe by the time you see this, the issue is resolved: https://github.com/python-poetry/poetry/issues/6409