CCExtractor / ccextractor

CCExtractor - Official version maintained by the core team
https://www.ccextractor.org
GNU General Public License v2.0
724 stars 428 forks source link

Add flag for Page Segmentation Modes control #1601

Closed Neo2SHYAlien closed 2 months ago

Neo2SHYAlien commented 8 months ago

In raising this pull request, I confirm the following (please check boxes):

My familiarity with the project is as follows (check one):


I added an flag -psm for controlling PSM (Page Segmentation Modes) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used.

p.s This PR is continue #1544 which was closed after the rebase 🥲

Neo2SHYAlien commented 8 months ago

@cfsmp3 After the resync of the main branch previous PR #1544 was closed automatically. I hope the code change to be good enough I'm nod a daily dev 😊

PunitLodha commented 3 months ago

@prateekmedia have you added this flag already?

prateekmedia commented 3 months ago

@PunitLodha Not added yet, will add once this merges.

PunitLodha commented 3 months ago

@prateekmedia could you add it to this PR itself?

prateekmedia commented 3 months ago

@PunitLodha Here I have made a PR to his repo: https://github.com/Neo2SHYAlien/ccextractor/pull/1

Neo2SHYAlien commented 3 months ago

@prateekmedia merged

prateekmedia commented 3 months ago

The tests failing will be resolved in #1635. cc @PunitLodha

PunitLodha commented 2 months ago

@prateekmedia the tests aren't passing yet

prateekmedia commented 2 months ago

@PunitLodha This PR needs rebase again.

ccextractor-bot commented 2 months ago

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 1a13bbb...:

Report Name Tests Passed
Broken 12/13
CEA-708 9/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 15/27
Hauppage 2/3
MP4 3/3
NoCC 10/10
Options 83/86
Teletext 21/21
WTV 9/13
XDS 22/34

All tests passing on the master branch were passed completely.

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:


Check the result page for more info.

ccextractor-bot commented 2 months ago

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 1a13bbb...:

Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 85/86
Teletext 21/21
WTV 13/13
XDS 34/34

All tests passing on the master branch were passed completely.

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:


Check the result page for more info.