Closed fnxpt closed 9 years ago
I've made a few studies about the question and found out the following:
So the questions arise:
So far, I've made a branch in my own fork, so pls, take a look on it. https://github.com/ws233/Tesseract-OCR-iOS/tree/loadDawgCache
When I was implementing Tesseract I notice that on iphone 4 it took more or less 19s to initialise, but we removed this function on iphone 4, because the photos didn't have the quality needed to recognise the text we needed
@kevincon, so what's your oppinion on this method? Do we need it in our lib? If so, I'll add a few unit tests and a description. Otherwise, I think it can be closed.
@ws233 I'm curious what the speedup is on the iPhone 4, because it sounds like @fnxpt is experiencing quite a long time to initialize Tesseract (19 seconds), but the numbers you reported are for the iPhone 5S which I don't think suffers from the same long initialization time as the iPhone 4.
Can you provide instructions and/or sample code for @fnxpt to try out your dawg cache loading feature? If it significantly decreases his/her initialization time on the iPhone 4 then I think we should add it to the library, but otherwise I'm not convinced it's worth adding to the library.
The instructions are quiet easy. You just log the time befor and after calling tesseract 'recognize' function. Twice. Since you need to compare the first 'recognize' function run with any further. Than you just run the function from the patch above before the first 'recognize' call and than ensure that both calls run equal time. You may also log run time for the function from the patch.
But I believe that the device has very small impact on the run time. I guess the more impact is done by the number of the languages used. Or even by the number of fonts in those traineddata files.
@fnxpt, can you log that function from the patch and 'recognize' function for us on your iPhone4? Also, could you provide more info about the number of languages you used for initializing tesseract and perhaps about the number of fonts you trained tesseract for? Thx in advance!
Hi guys,
Tomorrow I will do these tests (I don't have my iphone 4 with me at the moment).
Thanks
I guys I performed a few tests with an iPhone 4 and an iPhone 6, with and without the patch. All the tests were made with a black screen in order to have the same test environment iPhone 4 without patch: 1st run: 423 milliseconds 2nd run: 378 milliseconds 3rd run: 332 milliseconds 4rd run: 340 milliseconds 5rd run: 335 milliseconds iPhone 4 with patch but without loadDawgCacheFromTessdataPath:forLanguages:: 1st run: 338 milliseconds 2nd run: 334 milliseconds 3rd run: 290 milliseconds 4rd run: 321 milliseconds 5rd run: 295 milliseconds iPhone 4 with patch: 1st run: 453 milliseconds 2nd run: 237 milliseconds 3rd run: 289 milliseconds 4rd run: 289 milliseconds 5rd run: 269 milliseconds
iPhone 6 without patch: 1st run: 25 milliseconds 2nd run: 19 milliseconds 3rd run: 17 milliseconds 4rd run: 18 milliseconds 5rd run: 18 milliseconds iPhone 6 with patch but without loadDawgCacheFromTessdataPath:forLanguages:: 1st run: 23 milliseconds 2nd run: 23 milliseconds 3rd run: 17 milliseconds 4rd run: 18 milliseconds 5rd run: 18 milliseconds iPhone 6 with patch: 1st run: 23 milliseconds 2nd run: 21 milliseconds 3rd run: 18 milliseconds 4rd run: 19 milliseconds 5rd run: 18 milliseconds
When I was implementing Tesseract I notice that on iphone 4 it took more or less 19s to initialise, but we removed this function on iphone 4, because the photos didn't have the quality needed to recognise the text we needed
@fnxpt I don't understand, you mentioned a 19 second initialization time but all of the numbers you reported above are less than 1 second. Were you testing initialization time? I thought initialization time was what we were interested in because you were experiencing such a long (19 seconds) initialization time on iPhone 4?
I performed those tests with the camera pointing to a table so the image is always black. If you have data it seems that it takes more time to load and recognize
On Friday, July 10, 2015, Kevin Conley notifications@github.com wrote:
When I was implementing Tesseract I notice that on iphone 4 it took more or less 19s to initialise, but we removed this function on iphone 4, because the photos didn't have the quality needed to recognise the text we needed
@fnxpt https://github.com/fnxpt I don't understand, you mentioned a 19 second initialization time but all of the numbers you reported above are less than 1 second. Were you testing initialization time? I thought initialization time was what we were interested in because you were experiencing such a long (19 seconds) initialization time on iPhone 4?
— Reply to this email directly or view it on GitHub https://github.com/gali8/Tesseract-OCR-iOS/issues/177#issuecomment-120449243 .
Sent from a mobile device
If you have data it seems that it takes more time to load and recognize
So shouldn't that be what you test? The purpose of your test is to see if @ws233's patch significantly alleviates the 19 second initialization time you reported on the iPhone 4, which I believe is still the point of this GitHub issue, right? If the patch does significantly alleviate the 19 second initialization time, then that will be a good justification for merging the patch into the library; otherwise we probably won't.
Since we never noticed a long initialization time for the iPhone 6, I don't think it's worth your time to perform this test for the iPhone 6. I think you should focus your test on the iPhone 4.
Also you can make your test consistent if you take a picture with your phone that has a lot of textual data, upload the picture to your computer, and embed the picture directly in your Xcode project as a resource so you don't have to take a picture at all for each test.
I made a few more tests, pointing to another place with more data. I was not able to reproduce the 19 second initialisation
iPhone 4 without patch: 1st run: 3.516 seconds 2nd run: 3.304 seconds 3rd run: 2.968 seconds iPhone 4 with patch but without loadDawgCacheFromTessdataPath:forLanguages:: 1st run: 4.38 seconds 2nd run: 3.806 seconds 3rd run: 3.842 seconds iPhone 4 with patch: 1st run: 3.768 seconds 2nd run: 3.398 seconds 3rd run: 3.18 seconds
I don't believe initialization depends on the image that tesseract is recognizing. Initialization stands for loading and preparing all the necessary data like language files, vocabularies and so on. That's actually why the first run takes longer than others. So I believe the initialization time may take too long, if we try to initialize tesseract with a huge number of different files. So the following questions are still opened:
@fnxpt,
@kevincon, let's close this, since there is no updates.
Hi,
I notice that Tesseract only initialises after the first recognise, this could take a while in some older devices. Is it possible to force tesseract to initialize after allocation it? I want to declare tesseract when my app starts and reuse it in some view controllers.
Best Regards, fnxpt