ContentMine / phylotree

A repository for ami-phylotree development
0 stars 0 forks source link

Image tests for EGID lookup. Incorrect OCR causes 3 to -> "8" in EGID #33

Open rossmounce opened 9 years ago

rossmounce commented 9 years ago

Image ID: 010504-0-000 (image below) one of these EGIDs is wrong

In the image, the correct text is Ulvibacter litoralis KMM 3912T (AY243096) but tesseract interprets the EGID as AY248096 which is another HIV-1 isolate when you look it up. A valid EGID number, but not the one that matches this tip.

A good test case for cross-matching our OCR-Binomial with EGID-looked-up-Binomial.

rossmounce commented 9 years ago

same kind of problem with this image 3 -> "8" (twice)

Image ID: 65096-0-002 (image below) one of these EGIDs is wrong

In the image, the correct text is V. caviae DSM 20738T (AY353714) but tesseract interprets the EGID as AY858714 which is "Equine infectious anemia virus isolate" when you look it up. A valid EGID number, but not the one that matches this tip.

Another good test case for cross-matching our OCR-Binomial with EGID-looked-up-Binomial.