-
While implementing chronam for the University of Nebraska-Lincoln's newspapers project, which involves several Czech language papers, we discovered that selecting a language on the advanced search pag…
-
http://yaroslavvb.blogspot.com/2009/08/new-robust-ocr-dataset.html
http://yaroslavvb.com/bib_digits_dataset.tar.gz
-
In this case, the female symbol is recognized as a large bold 2. Is there a way to fix this in a batch of articles?
![image](https://user-images.githubusercontent.com/4609956/178932782-a41d440d-e56…
-
The following feedback was reported by a BHL user:
"https://www.biodiversitylibrary.org/page/2334415 claims that Gastrochaena (Spengleri) Tryon 1862 is a species shown on the page. Gastrochaena is …
-
### New description
When OCRing both Text and Formulas (math, chemistry, ...), add a feature to automatically convert LaTeX Math embeds `\[` and `\]` as SMA tags `[$]` and `[/$]`.
See below for …
-
I looked at the average frequency of the word "woman" or "women" in all papers by time. I used the dataset "BMJ_171119_wordsbypapers_edited.csv" for this. This is just word count data --- nothing to…
-
I've just watched GPT-4's intro stream and this seems like a pretty good use case for it. ChatGPT could already be used for auto tagging based on the existing OCR results, with GPT-4 it might be possi…
-
I found, that some ECS papers has gif pictures for formulas and numbers.
For example: http://jes.ecsdl.org/content/157/3/J69.full
span class="inline-formula" id="inline-formula-38">
-
Hi, I would like to ask, is the markdown of processed data without pictures and tables?
-
Hi Ger,
Just logging this for the sake of it kinda, but also means I can keep a log of version numbers and their issues for reference.
v80s: Loses Watch folder on restarts, even if not launched on…