Closed ghost closed 3 years ago
You need to build with OCR support, not just have the libraries installed.
On Thu, Nov 24, 2016 at 1:08 PM, Bent Bagger notifications@github.com wrote:
On a Linux system I have compiled version 0.82 of CCextractor with these lines:
cmake -DWITH_OCR=ON ../src/ make sudo make install
but when I run this command
ccextractor -pn 7176 -codec dvbsub Pointless.ts (program number taken from 'mediainfo')
CCextractor amongst other outputs this:
Opening file: Pointless.ts File seems to be a transport stream, enabling TS mode Analyzing data in general mode DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output Creating Pointless.srt
The generated Pointless.srt is 3 bytes long and contains this hex string "bbef 00bf".
I have installed leptonica-devel and tesseract-ocr-devel.
Have I missed something during the compilation or/and am I using the wrong parameter in my call of CCextractor?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/442, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2Y9JaYYbKKyN_xMJVK_pH1hsEHStks5rBfy-gaJpZM4K79j9 .
I thought I had, but apparently not. Anyway, I got it working by running these commands in /usr/local/src/ccextractor (a soft link to ccextractor.0.82):
cd build
cmake -DWITH_OCR=ON ../src/
cd ../linux/
make clean
make ENABLE_OCR=yes
make install
('make clean' only to start from a clean slate).
Allow me an additional question: When I now run CCextractor I do get a .srt file but CCextractor complains a little:
Opening file: Pointless.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
dan.traineddata not found! Switching to English
swe.traineddata not found! Switching to English
fin.traineddata not found! Switching to English
Creating Pointless.srt
Using English trained data on Scandinavian texts makes for funny results!
The tesseract-ocr trained data is installed in /usr/share/tessdata/. So my additional question is actually two:
I may have part of an answer to my question 1 above. When I run an 'strace' on CCextractor I found that CCextractor looks locally to find the trained data:
openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
write(1, "dan.traineddata not found! Switc"..., 48dan.traineddata not found! Switching to English
but globally to find the English data:
open("/usr/share/tessdata/eng.traineddata", O_RDONLY) = 4
When I added a link from current directory to /usr/shar/tessdata CCextractor stopped complaining over missing data.
It is inconvenient to have to add links to every directory when I have videoes stored, so is this a fault or a feature?
@BentB may you please close this issue as the original "OCR subsystem not present" is now resolved. You can open another issue for your new problem if it still exists.
Please help us with pull request
It's not bug nither feature, it's incomplete implementation.
-Anshul
On 25-Nov-2016 5:28 PM, "Bent Bagger" notifications@github.com wrote:
I may have part of an answer to my question 1 above. When I run an 'strace' on CCextractor I found that CCextractor looks locally to find the trained data:
openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory) write(1, "dan.traineddata not found! Switc"..., 48dan.traineddata not found! Switching to English
but globally to find the English data:
open("/usr/share/tessdata/eng.traineddata", O_RDONLY) = 4
When I added a link from current directory to /usr/shar/tessdata CCextractor stopped complaining over missing data.
It is inconvenient to have to add links to every directory when I have videoes stored, so is this a fault or a feature?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/442#issuecomment-262941877, or mute the thread https://github.com/notifications/unsubscribe-auth/AHCOGvZrnB2Bx4tfQg8NGIWLAP0WBp2Dks5rBs1ngaJpZM4K79j9 .
@anshul1912 I'm not quite familiar with life here at Github so please expand a bit on what you mean by "Please help us with pull request". I know 'pull' from Git, but not in this context. Sorry about that.
I get the same error "OCR subsystem not present" on MacOS but leptonic and tesseract are installed on system. CCX -v shows: Version: 0.88 Git commit: Unknown Compilation date: 2020-02-04 File SHA256: fa4b6f64af9f923a0fca842ae017a189740de63916188b8afa43e6c00acb07b5 Libraries used by CCExtractor libGPAC Version: 0.7.2-DEV zlib: 1.2.11 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.6.35 FreeType libhash nuklear libzvbi
Do You know where can be a problem ?
Hello, I get the same issue on Ubuntu 16.04 using Tesseract 4.1.1, even after following @ghost 's compilation guide.
Linux desktop 4.15.0-76-generic #86~16.04.1-Ubuntu SMP Mon Jan 20 11:02:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
CCExtractor detailed version info Version: 0.88 Git commit: 6697ed34967343830178f8452e276ab0d94f08e0 Compilation date: 2020-02-04 File SHA256: Could not open file Libraries used by CCExtractor libGPAC Version: 0.7.2-DEV zlib: 1.2.8 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.2.54 FreeType libhash nuklear libzvbi
Reading from UDP socket 226.51.0.0:1234
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output
TS continuity counter not incremented prev/curr 4/6
Found large gap(1072860) in PTS! Trying to recover ...
Found large gap(1072861) in PTS! Trying to recover ...
Found large gap(1072864) in PTS! Trying to recover ...
Found large gap(1072865) in PTS! Trying to recover ...
Found large gap(1072862) in PTS! Trying to recover ...
Found large gap(1072863) in PTS! Trying to recover ...
@cfsmp3 Can You reopen issue ?
Closing as we've made a lot of changes in build lately so I don't know if this is still an issue or not
@wojtekw @rialg let me know if it's still a problem in master
Hello, I am facing the same issue with tesseract 4.1.1 leptonica-1.76.0 I tried compiling with the below steps and @ghost's both haven't worked for me. Please let me know if any changes needs to be done while compiling.
mkdir build cd build cmake -DWITH_OCR=ON -DWITHOUT_RUST=ON ../src/ make
I am using Centos 8 for compiling. Below is the ccextractor --version output
CCExtractor detailed version info Version: 0.94 Git commit: 35e73c1c90ce3ca69394d3523836bb1cdec28f11 Compilation date: 2023-08-04 CEA-708 decoder: C File SHA256: 08b9e909cc730e591a4331eef6dd45584a20e4a92c8dbf3fc37bf570f48ce79e Libraries used by CCExtractor Tesseract Version: 4.1.1 Leptonica Version: leptonica-1.76.0 libGPAC Version: 1.0.1 zlib: 1.2.11 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.6.37 FreeType libhash nuklear libzvbi
ldd output for ccextractor
linux-vdso.so.1 (0x00007ffcf98db000) libm.so.6 => /lib64/libm.so.6 (0x00007fe365526000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe365306000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe365102000) libtesseract.so.4 => /lib64/libtesseract.so.4 (0x00007fe364b9b000) liblept.so.5 => /lib64/liblept.so.5 (0x00007fe36471a000) libc.so.6 => /lib64/libc.so.6 (0x00007fe364358000) /lib64/ld-linux-x86-64.so.2 (0x00007fe3658a8000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe363fc3000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe363dab000) libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe363b73000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007fe36393e000) libz.so.1 => /lib64/libz.so.1 (0x00007fe363727000) libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007fe3634be000) libgif.so.7 => /lib64/libgif.so.7 (0x00007fe3632b4000) libtiff.so.5 => /lib64/libtiff.so.5 (0x00007fe36303b000) libwebp.so.7 => /lib64/libwebp.so.7 (0x00007fe362dcd000) libjbig.so.2.1 => /lib64/libjbig.so.2.1 (0x00007fe362bc1000)
If I directly use the tesseract commands it was working image-to-text conversion.
@kousthub97 try compile previous commit 0264e7da2be67182deb031228eb07e6ed4943c81 or v0.94 tag :) Both should work
@Neo2SHYAlien Thanks for help it worked with v0.94 tag
On a Linux system I have compiled version 0.82 of CCextractor with these lines:
cmake -DWITH_OCR=ON ../src/
make
sudo make install
but when I run this command
ccextractor -pn 7176 -codec dvbsub Pointless.ts
(program number taken from 'mediainfo')CCextractor amongst other outputs this:
Opening file: Pointless.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output
Creating Pointless.srt
The generated Pointless.srt is 3 bytes long and contains this hex string "bbef 00bf".
I have installed leptonica-devel and tesseract-ocr-devel.
Have I missed something during the compilation or/and am I using the wrong parameter in my call of CCextractor?