dlight / pdftotext

High-level Rust library that binds to Poppler to extract text from a PDF
GNU General Public License v2.0
10 stars 6 forks source link

Memory leak after `textOut.takeText()` #1

Open avalon1610 opened 1 year ago

avalon1610 commented 1 year ago

takeText() take the ownership of page, so we need call page->decRefCnt() after page->dump()

dlight commented 1 year ago

Hey, thanks! Do you want to send a PR or should I do it?

Also: are you using this for anything? Do you think this need a better API or updating poppler, or it's good as is?

avalon1610 commented 1 year ago

I'm sorry that my working environment is not convenient for submitting PR.
In fact, I encountered several problems unexpectedly when using this library on Windows to search for specific text in pdf, I would like to share my experience to see if it will help to improve this project:

int lastPage = doc->getNumPages(); for (int pageNum = 1; pageNum <= lastPage; pageNum++) { newpage_f(stream, pageNum); TextOutputDev textOut(nullptr, tree, 0.0, false, false, false); if (!textOut.isOk()) { return CouldntOutput; } textOut.setTextEOL(eolUnix); doc->displayPage(&textOut, pageNum, 72.0, 72.0, 0, true, false, false); TextPage *page = textOut.takeText(); page->dump(stream, output_f, true, eolUnix, false); page->decRefCnt(); }

return NoError;

- add return code check in `pdftotext_layout` in lib.rs
```rust
let code = unsafe { pdftotext_print_with_layout( ... ) };
match code {
    ResultCode::NoError => Ok(vec),
    ResultCode::InternalError => Err(Error::InternalError),
    ResultCode::CouldntReadPdf => Err(Error::CouldntReadPdf),
    ResultCode::CouldntOutput => Err(Error::CouldntOutput),
}
dlight commented 1 year ago

Oh.. now that's interesting! I've never thought this could possibly run on Windows, but it's nice to see that with some tweaks it does!

I think for convenience it should bundle not only poppler but all its dependencies; building on Windows should just work on either mingw and msvc. And maybe bundle poppler's official prebuild libraries too, under a different feature flag.

I'm sorry that my working environment is not convenient for submitting PR.

Do you mean that your employer haven't cleared your code for submitting it upstream? Or it's more like, it's hard to disentangle those fixes from other commits unrelated to this?

Unfortunately it's a bit hard to me to make this PR because I don't run Windows. (also I don't use this anymore 😅) but Windows aside, that's quite a few bugfixes, thanks!