LinXueyuanStdio / LaTeX_OCR_PRO

:art: 数学公式识别增强版:中英文手写印刷公式、支持初级符号推导(数据结构基于 LaTeX 抽象语法树)Math Formula OCR Pro, supports handwrite, Chinese-mixed formulas and simple symbol reasoning (based on LaTeX AST).
GNU General Public License v3.0
1.11k stars 235 forks source link

Why perplexity is negative? #36

Closed JackyLiu-SCUT closed 3 years ago

JackyLiu-SCUT commented 3 years ago

Why perplexity is negative? By definition it is 2 to the power H(x), which is always positive. And I think there is no problem in the training result nor the codes.

Looking forward to be educated and discusssing with everyone interested in this project! ^_^ I am a college student who just enter this cv field and select this domain as my graduation project.

image

LinXueyuanStdio commented 3 years ago

Wow, you got a high exact match score! Good job! In this project, perp = - np.exp(ce_words / float(n_words))

JackyLiu-SCUT commented 3 years ago

Wow, you got a high exact match score! Good job! In this project, perp = - np.exp(ce_words / float(n_words))

Thank you for replying! I am a rookie in programming and your reply helps me understand a lot!

Another question I want to ask before closing this issue:

In data.json there is an attribute named max_iter, and the default value is 5000. Does it mean even if I have 10k samples in training set, but it just uses 5k in training?

LinXueyuanStdio commented 3 years ago

YES.

JackyLiu-SCUT commented 3 years ago

Thank you for replying! Closing the issue now.