LM training ? - Githubissues

githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

https://towardsdatascience.com/2326a3487cd5

MIT License

1.99k stars 893 forks source link

LM training ? #138

Closed jjsr closed 2 years ago

jjsr commented 2 years ago

Dear Sir, Greetings of the day. Sir in research paper these are two types LM training texts mentioned -1) tr+L and Te . Sir is this referring to the corpus by which we are building prefix tree ? or something else. Thanks in Advance sir

jjsr commented 2 years ago

Sir are we using whole IAM images data or just test set images data while testing the WBS algorithm ??

githubharald commented 2 years ago

is this referring to the corpus

yes, it's about which corpus is used to "train" the LM (training in that case simply means computing the word statistics). See paper: "The LM is either trained with the text of the test-set (denoted as Te) or the text of the training-set concatenated with a word list2 (denoted as Tr+L) which consists of 370,099 words. "