RaiMan / SikuliX1

SikuliX version 2.0.0+ (2019+)
https://sikulix.github.io
MIT License
2.76k stars 354 forks source link

Tess4J version 3.5.3 --- OCR stopped working #195

Closed Skwol closed 5 years ago

Skwol commented 5 years ago

Used 1.1.4-SNAPSHOT for several months. OCR worked smoothly in IDE and in Java project. On 15th of September I've updated Java project using maven

    <repositories>
        <repository>
            <!--OSSRH: com.sikulix-->
            <id>com.sikulix</id>
            <name>com.sikulix</name>
            <url>https://oss.sonatype.org/content/groups/public</url>
            <layout>default</layout>
            <snapshots>
                <enabled>true</enabled>
                <updatePolicy>always</updatePolicy>
            </snapshots>
        </repository>
    </repositories>

    <dependencies>
        <dependency>
            <groupId>com.sikulix</groupId>
            <artifactId>sikulixapi</artifactId>
            <version>1.1.4-SNAPSHOT</version>
        </dependency>
    </dependencies>

New version seems to be 1.1.4-SNAPSHOT 20190913.083811 After that any OCR usage returns empty string. Simple examples I'm trying:

package pac.name;

import org.sikuli.basics.Settings;
import org.sikuli.script.Region;

public class Main {

    public static void main(String[] args) {
        Settings.OcrTextRead = true;
        Settings.OcrTextSearch = true;
        Region r = new Region(100,100,50,21);
        System.out.println("r.text() " + r.text());
    }
}

I've downloaded fresh IDE version from a website (build#: 380 2019-09-13_08:35) and tried:

Settings.OcrTextRead=True
Settings.OcrTextSearch = True
r = Region(100,100,50,21)
print(r.text())

Every time it's just an empty strings. Am I missing something? Or is something changed and I need to find some other way to use it? Build #: 299 (2019-08-08_14:05) seems to work fine with same code.

Skwol commented 5 years ago

As a side question. Is there a way I can use previous version without messing with m2 dir? In other words is there a way to specify a build in maven?

balmma commented 5 years ago

Can confirm that OCR is not working at all in the current snapshot on Windows (not tested on other OSes yet). Will have a look at it.

balmma commented 5 years ago

The bug was introduced with Commit a25c467d8d68a7525815cdffcfa929ce2e296b20 where Tess4J got updated from 3.5.2 to 3.5.3. The changelog (http://tess4j.sourceforge.net/changelog.html) says that they fixed a compatibility issue with JDK9's ByteBuffer.flip() method. Might cause another bug in Java 12?

Setting it back to 5.3.2 seems to fix the issue.

BUT: Upgrading the version to 4.4.0 also seems to work flawlessly. This issue is probably a good opportunity to upgrade to the latest Tess4J version :-)

Skwol commented 5 years ago

Thanks for a quick response. Totally forgot to mention that I'm using OS X, though it feels like it doesn't matter in this case.

RaiMan commented 5 years ago

@Skwol Thanks @balmma for finding the possible regression point. I am testing on Mac, where Region.text() returns nothing also with latest build (Java 8 and Java 12) I will try with Tess4J 3.5.2 and if that works, go back in the first step. Then I will try with Tess4J 4 (the challenge here are the native libs for Mac and Linux, which might have to be revised also).

balmma commented 5 years ago

@RaiMan Did some digging in the Tess4J source code and found the offending change (https://github.com/nguyenq/tess4j/commit/ba1d5fd3d62a44d6c4949f47c1f991c9f7143aa7). The new if statement does not really make sense IMHO. For color images we get a DataBufferInt, for grey ones we get DataBufferByte. For DataBufferInt we have to get the pixel size, for data BufferByte we can set this to 8.

Best would be from our side to convert the image to greyscale before passing it to Tess4J. Will create a PR for this shortly.

And I have to figure this out with the Tess4J guys.

RaiMan commented 5 years ago

Tested with Tess4J 3.5.2 - works. A new build and a new snapshot with the fix (going back to 3.5.2) are now available

balmma commented 5 years ago

PR still relevant?

RaiMan commented 5 years ago

IMO not needed in the moment. I will now try Tess4J 4

balmma commented 5 years ago

Even better :-)