antimatter15 / tesseract-rs

Rust bindings for Tesseract
MIT License
144 stars 31 forks source link

Use a builder pattern with chainable methods for Tesseract? #18

Closed hdevalence closed 4 years ago

hdevalence commented 4 years ago

Hi, thanks for working on these bindings! While I was looking at the docs and some of the tests, I wondered whether the API might be slightly more ergonomic if the Tesseract struct had chainable methods and was consumed by the OCR call.

For instance, if Tesseract had

pub struct Ocr { /* ... */ }

impl Ocr {
    pub fn new(language: &str) -> Ocr { /* ... */ }
    pub fn image_path(mut self, path: impl AsRef<Path>) -> Ocr { /* ... */ }
    pub fn image_data(mut self, data: &[u8]) -> Ocr { /* ... */ }
    pub fn ocr(mut self) -> Result<String, Error> { /* ... */ }
}

then users could write

    let text = Ocr::new(language).image_path(filename).ocr()?;

instead of writing

    let mut cube = Tesseract::new();
    cube.set_lang(language);
    cube.set_image(filename);
    cube.recognize();
    cube.get_text()

as in the current implementation of the ocr function.

In addition to being slightly nicer (imo) to write and read, it seems like there are also safety benefits, because you can provide a more misuse-resistant API (e.g., requiring the language parameter in Ocr::new).

ccouzens commented 4 years ago

Hi @hdevalence,

I agree.

I've started a branch to play around with similar ideas. This commit in particular f31f8d76df72a79952d22e9bed00a45c87ec9ac6

When I finish it, may I add you as a reviewer?

In addition to being slightly nicer (imo) to write and read, it seems like there are also safety benefits, because you can provide a more misuse-resistant API (e.g., requiring the language parameter in Ocr::new).

Tesseract will quit your program (panic?) if you try and read the text before initializing it or assigning it an image https://github.com/antimatter15/tesseract-rs/issues/16. I'm going to see if I can solve that by using the type system to make the user call initialize and call set-image before getting the text. And a nice side effect of this is the methods will become chainable.

hdevalence commented 4 years ago

SGTM!

ccouzens commented 4 years ago

@hdevalence the pull request is ready for review https://github.com/antimatter15/tesseract-rs/pull/20 🙂