I have error when using split_htmls_to_pages, process https://ar5iv.labs.arxiv.org/html/1110.5321
the error message is ERROR:root:missing reference detected
I find the error is caused by "br" and "LABEL:eq1", in latexlml_parser.py, line 175, the "br" and "LABEL:eq1" is not numeric or have href, so the resolved is False,I think it is common that the reference is not a number. Can you find a way to solve it please?
By the way, the total convert ratio is around 17%(about 500,000 pairs of pdf and html),is this normal?
I have error when using split_htmls_to_pages, process https://ar5iv.labs.arxiv.org/html/1110.5321 the error message is ERROR:root:missing reference detected I find the error is caused by "br" and "LABEL:eq1", in latexlml_parser.py, line 175, the "br" and "LABEL:eq1" is not numeric or have href, so the resolved is False,I think it is common that the reference is not a number. Can you find a way to solve it please?
By the way, the total convert ratio is around 17%(about 500,000 pairs of pdf and html),is this normal?