Open kba opened 3 years ago
with regular numbers or super/subscript numbers?
no, regular numbers are what Unicode suggests for this. The typical small-script font appearance is implemented by Unicode renderers merely because of the pattern numeral fraction-slash numeral
, i.e. both the numerator and denominator are ordinary (ASCII) numerals. (You can try it out with an editor/browser of your choice, e.g. ¾⅔
(precomposed) vs 3⁄4 2⁄3
(independent but rendered equally by good fonts/engines – GH obviously is not one of them).
or even produce LaTeX syntax
I'd recommend against that. LSTM-CTC will learn to give you character sequences, but getting a certain syntax consistently is pure luck.
Note: the actual argument for differentiating fraction slash against ordinary slash goes as follows: on the visual side, a fraction will always be discernable from other numeric expressions involving slash (like dates or identifiers/codes), because it looks super/subscripted, so the OCR can learn that. That's even independent of the decision whether super/subscript numbers should be represented as such (or ordinary numbers).
While Unicode does have codepoints for the most common fractions (
¼
,½
,¾
etc). this does not scale because of course not all possible numerator/denominator combinations are available. So it might be best to encode fractions as just "numerator fraction-slash denominator" (with regular numbers or super/subscript numbers?) or even produce LaTeX syntax.