duncantl / Rtesseract

Interface to tesseract OCR system.
12 stars 4 forks source link

Memory leak with GetBoxes (and other functions) #12

Open jcarlen opened 5 years ago

jcarlen commented 5 years ago

Can reproduce by running one of the simple tests, e.g.: Rscript tests/basic.R

Don't even need the whole script, including up to any of the lines: alts = GetAlternatives(f) w = GetConfidences(f) bb = GetBoxes(f) Generates something like:

ObjectCache(0x10ac56678)::~ObjectCache(): WARNING! LEAK! object 0x7fa51b279600 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-punc-dawg) ObjectCache(0x10ac56678)::~ObjectCache(): WARNING! LEAK! object 0x7fa51b279660 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-word-dawg) ObjectCache(0x10ac56678)::~ObjectCache(): WARNING! LEAK! object 0x7fa51b278a80 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-number-dawg)

This may be related to https://github.com/duncantl/Rtesseract/issues/3, but I have tesseract 4.0.0 and that post says the other issue was fixed with 4.00.00dev.

My tesseract -v: tesseract 4.0.0 leptonica-1.76.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 1.0.0 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE

I'm using the latest master of Rtesseract (0.4.0)

**sorry I'm just getting to posting this. It was in an old email, but then I think it got overshadowed by another bug that was causing crashes

duncantl commented 5 years ago

Hi Jane. Isn't it true that you get the ObjectCache messages only when you quit from the R session? And if you run gc() # garbage collect q() then the messages don't appear?

We need to ensure that the finalizers are run at the end of the R session. We can do this with a .Last = function() gc(). But the finalizers should run regardless, but not doing so currently.

Thanks

jcarlen commented 5 years ago

Correct, adding gc(); q() then the messages don't appear. Is this problem related to the fix for https://github.com/duncantl/Rtesseract/issues/11 ?

duncantl commented 5 years ago

It is loosely related. It relates to finalizers on C objects held by R as external pointers. In #11, the finalizer was being run by R to free a Pix object, but the tesseract instance also freed the same Pix when it was finalized by R - so double freeing the Pix. This issue here is the tesseract finalizer not getting run when the session ends.

jcarlen commented 5 years ago

Thanks for explaining. So, thematically related but not causally (unless the fix for #11 was to turn off a tesseract finalizer which led to this)?

On Jun 14, 2019, at 10:04 AM, Duncan Temple Lang notifications@github.com wrote:

It is loosely related. It relates to finalizers on C objects held by R as external pointers. In #11 https://github.com/duncantl/Rtesseract/issues/11, the finalizer was being run by R to free a Pix object, but the tesseract instance also freed the same Pix when it was finalized by R - so double freeing the Pix. This issue here is the tesseract finalizer not getting run when the session ends.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/duncantl/Rtesseract/issues/12?email_source=notifications&email_token=AAWMTCY4UD4P63FVSEOMWFDP2PFQDA5CNFSM4HXVZFWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXXMWWQ#issuecomment-502188890, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWMTC3WESZFFSIXKXE7WYLP2PFQDANCNFSM4HXVZFWA.

duncantl commented 5 years ago

Thematically related - a very nice description