Open FreeGoldRush opened 4 years ago
ubuntu@tesseract-ocr:~/TEST$ wget https://github.com/Shreeshrii/tessdata_shreetest/raw/master/digits.traineddata
--2019-12-20 03:30:50-- https://github.com/Shreeshrii/tessdata_shreetest/raw/master/digits.traineddata
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/Shreeshrii/tessdata_shreetest/master/digits.traineddata [following]
--2019-12-20 03:30:50-- https://raw.githubusercontent.com/Shreeshrii/tessdata_shreetest/master/digits.traineddata
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11293175 (11M) [application/octet-stream]
Saving to: ‘digits.traineddata’
digits.traineddata 100%[==========================================================================================================>] 10.77M 37.1MB/s in 0.3s
2019-12-20 03:30:51 (37.1 MB/s) - ‘digits.traineddata’ saved [11293175/11293175]
ubuntu@tesseract-ocr:~/TEST$ ls -l digits.traineddata
-rw-rw-r-- 1 ubuntu ubuntu 11293175 Dec 20 03:30 digits.traineddata
ubuntu@tesseract-ocr:~/TEST$ tesseract digits.png - -l digits --tessdata-dir ./
33109
94027
33480
94301
10577
19035
90067
02493
tesseract -v
tesseract 5.0.0-alpha-556-g0823
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.9 : zlib 1.2.11.1-motley : libwebp 0.4.4 : libopenjp2 2.3.0
Found OpenMP 201511
your filesize seems incorrect.
-rw-r--r-- 1 root root 65605 Dec 19 23:18 digits.traineddata
i have
-rw-rw-r-- 1 ubuntu ubuntu 11293175 Dec 20 03:30 digits.traineddata
You need to download 'raw' file from github. Use download link.
Now it works, but I'm getting about the same results that I did with '-l eng'. Hmmm... Need to do some more testing.
You can try whitelist with eng.traineddata.
On Fri, Dec 20, 2019, 22:57 FreeGoldRush notifications@github.com wrote:
Now it works, but I'm getting about the same results that I did with '-l eng'. Hmmm... Need to do some more testing.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Shreeshrii/tessdata_shreetest/issues/15?email_source=notifications&email_token=ABG37IYFENU67E24IIO5IK3QZT6BDA5CNFSM4J5SUN4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHNSI4Y#issuecomment-568009843, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG37IZR3JTDRSYALE5T2FLQZT6BDANCNFSM4J5SUN4A .
A whitelist with ‘-l eng’ is actually what I am doing now. Was trying to get a better response for numbers only.
Where can I find instructions on how to train it? I can put together a training set with thousands of images and the proper 8 digit code that should be scanned from each image. The images offer a wide variety of photo quality. Is this sufficient information to make my own .traineddata file?
Thanks again.
Mike
From: Shreeshrii notifications@github.com Reply-To: Shreeshrii/tessdata_shreetest reply@reply.github.com Date: Friday, December 20, 2019 at 8:34 PM To: Shreeshrii/tessdata_shreetest tessdata_shreetest@noreply.github.com Cc: Mike mike@trebronics.com, Author author@noreply.github.com Subject: Re: [Shreeshrii/tessdata_shreetest] digits.traineddata failed to open? (#15)
You can try whitelist with eng.traineddata.
On Fri, Dec 20, 2019, 22:57 FreeGoldRush notifications@github.com wrote:
Now it works, but I'm getting about the same results that I did with '-l eng'. Hmmm... Need to do some more testing.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Shreeshrii/tessdata_shreetest/issues/15?email_source=notifications&email_token=ABG37IYFENU67E24IIO5IK3QZT6BDA5CNFSM4J5SUN4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHNSI4Y#issuecomment-568009843, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG37IZR3JTDRSYALE5T2FLQZT6BDANCNFSM4J5SUN4A .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Shreeshrii/tessdata_shreetest/issues/15?email_source=notifications&email_token=AOA3CJUCD6ESKZ6B7V2YLD3QZVXCXA5CNFSM4J5SUN4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHOSFKA#issuecomment-568140456, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOA3CJXID7JFRUIWK7LWKWTQZVXCXANCNFSM4J5SUN4A.
Check out tesseract-ocr/tesstrain repo to train using images
On Sat, Dec 21, 2019, 08:03 FreeGoldRush notifications@github.com wrote:
A whitelist with ‘-l eng’ is actually what I am doing now. Was trying to get a better response for numbers only.
Where can I find instructions on how to train it? I can put together a training set with thousands of images and the proper 8 digit code that should be scanned from each image. The images offer a wide variety of photo quality. Is this sufficient information to make my own .traineddata file?
Thanks again.
Mike
From: Shreeshrii notifications@github.com Reply-To: Shreeshrii/tessdata_shreetest reply@reply.github.com Date: Friday, December 20, 2019 at 8:34 PM To: Shreeshrii/tessdata_shreetest tessdata_shreetest@noreply.github.com Cc: Mike mike@trebronics.com, Author author@noreply.github.com Subject: Re: [Shreeshrii/tessdata_shreetest] digits.traineddata failed to open? (#15)
You can try whitelist with eng.traineddata.
On Fri, Dec 20, 2019, 22:57 FreeGoldRush notifications@github.com wrote:
Now it works, but I'm getting about the same results that I did with '-l eng'. Hmmm... Need to do some more testing.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/Shreeshrii/tessdata_shreetest/issues/15?email_source=notifications&email_token=ABG37IYFENU67E24IIO5IK3QZT6BDA5CNFSM4J5SUN4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHNSI4Y#issuecomment-568009843>,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABG37IZR3JTDRSYALE5T2FLQZT6BDANCNFSM4J5SUN4A>
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/Shreeshrii/tessdata_shreetest/issues/15?email_source=notifications&email_token=AOA3CJUCD6ESKZ6B7V2YLD3QZVXCXA5CNFSM4J5SUN4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHOSFKA#issuecomment-568140456>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AOA3CJXID7JFRUIWK7LWKWTQZVXCXANCNFSM4J5SUN4A>.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Shreeshrii/tessdata_shreetest/issues/15?email_source=notifications&email_token=ABG37I67HJA24BZPUUPL3B3QZV56VA5CNFSM4J5SUN4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHOTGMI#issuecomment-568144689, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG37IYWP45RQI5NQR67GE3QZV56VANCNFSM4J5SUN4A .
[ec2-user@runner1]$ tesseract --list-langs List of available languages (4): digits digits1 digits_comma eng [ec2-user@runner1]$ tesseract --oem 1 -l digits out.png stdout --psm 6 Error opening data file /usr/local/share/tessdata/digits.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'digits' Tesseract couldn't load any languages! Could not initialize tesseract. [ec2-user@runner1]$
digits.traineddata is clearly at /usr/local/share/tessdata/digits.traineddata. Any idea what the problem could be? Using the "eng" language works as expected.
[ec2-user@runner1]$ tesseract --version tesseract 4.1.0 leptonica-1.78.0 libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.2.49 : libtiff 4.0.3 : zlib 1.2.8 Found AVX2 Found AVX Found SSE
[ec2-user@runner1]$ ls -ltr /usr/local/share/tessdata total 15256 drwxr-xr-x 2 root root 4096 Dec 17 20:23 configs drwxr-xr-x 2 root root 4096 Dec 17 20:23 tessconfigs -rw-r--r-- 1 root root 572 Dec 17 20:23 pdf.ttf -rw-r--r-- 1 root root 65693 Dec 19 22:53 digits_comma.traineddata -rw-r--r-- 1 root root 65625 Dec 19 22:53 digits1.traineddata -rw-r--r-- 1 root root 15400601 Dec 19 22:58 eng.traineddata -rw-r--r-- 1 root root 65605 Dec 19 23:18 digits.traineddata [ec2-user@runner1]$
Thanks!