githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 894 forks source link

Where can I find the tagset.txt file #155

Closed kumarsyamala closed 1 year ago

kumarsyamala commented 1 year ago

Hi,

I am explore the IAM database https://fki.tic.heia-fr.ch/databases/iam-handwriting-database

In the github page you have mentioned to use words.tgz and words.txt in ascii folder.

In words.txt starting part explaining the format of annotation which did in the text file

--- words.txt ---------------------------------------------------------------

#

iam database word information

#

format: a01-000u-00-00 ok 154 1 408 768 27 51 AT A

#

a01-000u-00-00 -> word id for line 00 in form a01-000u

ok -> result of word segmentation

ok: word was correctly

er: segmentation of word can be bad

#

154 -> graylevel to binarize the line containing this word

1 -> number of components for this word

408 768 27 51 -> bounding box around this word in x,y,w,h format

AT -> the grammatical tag for this word, see the

file tagset.txt for an explanation

A -> the transcription for this word

# a01-000u-00-00 ok 154 408 768 27 51 AT A a01-000u-00-01 ok 154 507 766 213 48 NN MOVE

In this AT & NN are the grammatical tag for the word and it was mentioned to look tagset.txt for the explanation but I couldnt find the tagset.txt file.

Can I know where can I find this text file for understanding?

githubharald commented 1 year ago

I don't know, I never used this file. But try to contact the authors of the IAM dataset directly, they should know.

kumarsyamala commented 1 year ago

I have filled the form and raised the question on the website, didn't got the reply yet. Do you have any other way to reach out?

githubharald commented 1 year ago

no, sorry