Closed johnlockejrr closed 1 month ago
Hi @johnlockejrr I found the bug today as well but was not able to find the time to solve it. My recommandation:
yaltai<2.0.0
for dataset conversionyaltaii>=2.0.1
for inferenceI hope I'll be able to get to the issue next week.
Ok, I'll do that until you fix it!
By the way, if I don't shuffle I get this error:
NameError: name 'parse_xml' is not defined
This error is definitely due to the new version. I am now worried the other bug is not connected to the new version, please keep me up to date
I trined a model with YALTAi 1.0.2
. I realized it extracted only the text regions from my PAGE XML (exported from eScriptorium) and not the lines... is that normal?
Last days I "eat" my brains working with YOLOv8 segmentation, wrote some scripts to extract polygons from PAGE XML datasets and convert them to YOLOv8 OBB, thats how I found (refound actually) your project. I needed a good text segmentation to extract text lines and send them to a PyLaia
model for text recognition (https://huggingface.co/spaces/johnlockejrr/PyLaia-heb_sam_v1).
About the second error, just reading your code I see in yaltai.py that parse_xml
is commented:
#from kraken.lib.xml import parse_xml
Why? :)
Edit...
ImportError: cannot import name 'parse_xml' from 'kraken.lib.xml' (/home/incognito/YALTAi/train-2.0.1-py3.11/lib/python3.11/site-packages/kraken/lib/xml.py)
Seems like in kraken now you should do from kraken.lib import xml
then xml.XMLPage(doc, format_type)
etc.
From kraken
5.x release notes:
While 5.x preserves the general OCR functional blocks, the existing dictionary-based data structures have been replaced with [container classes](https://kraken.re/5.2/api_docs.html#kraken-containers-module) and the XML parser has been reworked
See https://github.com/mittagessen/kraken/releases/tag/5.2 under API changes
Coming back to it today :)
Thanks to God! :D Can't wait!
This issue should be fixed but warning I changed the command to simplify things:
yaltai alto-to-yolo sam_new/*.xml sam_seg --shuffle .1 --segmonto region
becomes yaltai convert alto-to-yolo ....
Perfect! I will clone the latest and give it a try.
And perfect you changed the command.
Should I install with pip install YALTAi
already or dirrectly from github? And what python version do you recommand?
pip install 'yaltai==2.0.2'
should be good
Can you confirm this one so that I could close this one at least ? :)
Same error:
(yaltai-2.0.2-py3.10) incognito@DESKTOP-H1BS9PO:~/YALTAi$ yaltai convert alto-to-yolo teyman_alto/*.xml datasets/teyman_alto --shuffle .1
Using list of inputs.
Found 70 to convert.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/incognito/YALTAi/yaltai-2.0.2-py3.10/bin/yaltai:8 in <module> │
│ │
│ 5 from yaltai.cli.yaltai import yaltai_cli │
│ 6 if __name__ == '__main__': │
│ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(yaltai_cli()) │
│ 9 │
│ │
│ /home/incognito/YALTAi/yaltai-2.0.2-py3.10/lib/python3.10/site-packages/click/core.py:1157 in │
│ __call__ │
│ │
│ /home/incognito/YALTAi/yaltai-2.0.2-py3.10/lib/python3.10/site-packages/click/core.py:1078 in │
│ main │
│ │
│ /home/incognito/YALTAi/yaltai-2.0.2-py3.10/lib/python3.10/site-packages/click/core.py:1688 in │
│ invoke │
│ │
│ /home/incognito/YALTAi/yaltai-2.0.2-py3.10/lib/python3.10/site-packages/click/core.py:1688 in │
│ invoke │
│ │
│ /home/incognito/YALTAi/yaltai-2.0.2-py3.10/lib/python3.10/site-packages/click/core.py:1434 in │
│ invoke │
│ │
│ /home/incognito/YALTAi/yaltai-2.0.2-py3.10/lib/python3.10/site-packages/click/core.py:783 in │
│ invoke │
│ │
│ /home/incognito/YALTAi/yaltai-2.0.2-py3.10/lib/python3.10/site-packages/yaltai/cli/yaltai.py:98 │
│ in alto_to_yolo │
│ │
│ 95 │ if val: │
│ 96 │ │ message(f"{len(val)} image for validation.", fg='green') │
│ 97 │ elif shuffle: │
│ ❱ 98 │ │ random.shuffle(input_paths) │
│ 99 │ │ val_idx = int(len(input_paths) * shuffle) │
│ 100 │ │ message(f"{val_idx+1}/{len(input_paths)} image for validation.", fg='green') │
│ 101 │
│ │
│ /usr/lib/python3.10/random.py:394 in shuffle │
│ │
│ 391 │ │ │ for i in reversed(range(1, len(x))): │
│ 392 │ │ │ │ # pick an element in x[:i+1] with which to exchange x[i] │
│ 393 │ │ │ │ j = randbelow(i + 1) │
│ ❱ 394 │ │ │ │ x[i], x[j] = x[j], x[i] │
│ 395 │ │ else: │
│ 396 │ │ │ _warn('The *random* parameter to shuffle() has been deprecated\n' │
│ 397 │ │ │ │ 'since Python 3.9 and will be removed in a subsequent ' │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 'tuple' object does not support item assignment
I missinterpreted your problem, and just fixed it I guess. You can confirm by installing 2.0.3
Works now, but with a little problem that I saw in the old ones, it finds only one class even I have two. In my case here, I have textzone
and textline
, the textline
is not detected. But I think is not YALTAi problem but an eScriptorium one, maybe not even eScriptorium problem but mine, I named my lines textline
and eScriptorium/kraken got confused because ALTO has a tag named TextLine
.
(yaltai-2.0.3-py3.11) incognito@DESKTOP-H1BS9PO:~/YALTAi$ yaltai convert alto-to-yolo teyman_alto/*.xml teyman_alto_test --shuffle .1
Using list of inputs.
Found 70 to convert.
HELLLOOO MOTHER FUCKER
8/70 image for validation.
Shuffling data with a ratio of 0.1 for validation.
70it [00:00, 213.93it/s]
70 ground truth XML files converted.
Configuration available at teyman_alto_test/config.yml.
Label Map available at teyman_alto_test/labelmap.txt.
Regions count:
- 00070 textzone
One question: is this really YOLOv8 because looks for me to be YOLOv5?
Yes it is :)
Yes it is :)
You are right.
(yaltai-2.0.3-py3.11) incognito@DESKTOP-H1BS9PO:~/YALTAi$ yolo version
8.0.209
Trying to set my dataset then train but I get this error: