Closed Ironholds closed 8 years ago
Testing on the pure C lib I'm not seeing this error. I do remember fixing a memory leak in that function at some point. Is this on latest?
Think I found it. Calls to expansion_array_destroy should be 1-to-1 with expand_address, otherwise it would leak memory (expansion_array_destroy is currently just a convenience function for freeing an array of char *'s, but may do more if we change the response type at some point).
I like the idea of vectorizing calls to libpostal, amortizes the cost of crossing the C boundary, similar to what we'd do with numeric arrays. Might be useful in some of the other bindings as well.
Hmn; interesting. I thought I'd tried that approach and it hadn't worked, but I'll try again!
Still happening, even with the creation of "expansion" and its destruction moved inside the loop. Interesting element: while it happens with normalise_addr("Quatre-vignt-douze Ave des Champs-Élysées")
, it doesn't happen if the address is instead "fffffffffffffffffffffffffffffffffffffffff"- deliberately the same length in char terms. UTF8/codepoint problem with the E-acute and other non-ASCII chars?
Ah, that is on my side. There was a temporary char_array I wasn't freeing (which only got allocated for non-ASCII text, which would explain the "fffffffffffffffffffffffffffffffffffffffff" case). Pushed a fix
The in-loop memory leak I mentioned previously kicks in when passing in multiple strings. If it's an n=1 loop with cleanup at the end, it so happens that the expansions pointer is pointing to the memory that needs to be freed and everything is copacetic, but for n > 1 loop iterations, the pointer gets overwritten and the memory for the first n-1 expansions is leaked.
Gotcha, so it should be "for each entry, run, cleanup" and not "create an instance, run one by one, cleanup"?
Yep, exactly. expand_address/parse_address are "caller frees," Unix-style.
Everything looks good to me in terms of the libpostal calls, memory, etc. The one thing to consider, as mentioned in the other issue, is how to handle the multiple values problem, mostly for expansions since they're not in any particular order (they're more like a set). There's a disambiguation model in the nearish-term works that should with high accuracy ensure that the first expansion is the correct one, but will be another model file to load, probably a small one.
Yeah, that makes sense. At the moment I'm just loading the first one anyway; looking forward to that model!
Okay, with the changes I just pushed I think we should be good. I'll test against the latest version to make sure the leak is cleaned. Thanks for your help! Hope the incidental bug identifications were useful :)
They were indeed! There may be a few additions to make for the new parser classes, and the disambiguation call will probably entail some minor API changes, but that's at least a couple weeks out - will keep you informed.
Yay! Thanks so much :)
leads to