Ironholds / poster

Address parsing and normalisation through libpostal
MIT License
59 stars 9 forks source link

Segfault on parsing #1

Closed Ironholds closed 8 years ago

Ironholds commented 8 years ago

addresses <- c("781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA", "The Book Club 100-106 Leonard St, Shoreditch, London, Greater London, England, EC2A 4RH, United Kingdom") poster:::parse_addr(addresses)

Leads to:

==3391== Invalid read of size 8 ==3391== at 0xD840C76: address_parser_parse (in /usr/local/lib/libpostal.so.0.0.0) ==3391== by 0xD81D35B: parse_address (in /usr/local/lib/libpostal.so.0.0.0) ==3391== by 0xD5FF6B4: parse_addr(Rcpp::Vector<16, Rcpp::PreserveStorage>) (poster.cpp:100) ==3391== by 0xD5FD958: poster_parse_addr (RcppExports.cpp:44) ==3391== by 0x4F0BA37: ??? (in /usr/lib/R/lib/libR.so) ==3391== by 0x4F4ACCA: Rf_eval (in /usr/lib/R/lib/libR.so) ==3391== by 0x4F4CDBF: ??? (in /usr/lib/R/lib/libR.so) ==3391== by 0x4F4AAD2: Rf_eval (in /usr/lib/R/lib/libR.so) ==3391== by 0x4F4BE56: Rf_applyClosure (in /usr/lib/R/lib/libR.so) ==3391== by 0x4F4A8AE: Rf_eval (in /usr/lib/R/lib/libR.so) ==3391== by 0x4F71D61: Rf_ReplIteration (in /usr/lib/R/lib/libR.so) ==3391== by 0x4F720B0: ??? (in /usr/lib/R/lib/libR.so) ==3391== Address 0x0 is not stack'd, malloc'd or (recently) free'd

albarrentine commented 8 years ago

Ok, first thing I noticed is parser isn't loaded. It has its own setup/teardown method, as sometimes it makes sense to load things in separate modules e.g. so you don't have to pull in the language classifier if it's not being used.

I haven't set everything up yet, but that's a likely cause.

Ironholds commented 8 years ago

Yeah, I saw your patch and immediately headdesked; my screwup. Now having trouble building on my machine but I think that's an indicator that I somehow messed up the libpostal install process rather than anything else.

albarrentine commented 8 years ago

Often a fresh make install will do the trick. Otherwise maybe nuke and try again?

Ironholds commented 8 years ago

thumbs up

Ironholds commented 8 years ago

Tried, no dice. Nuke and try again it is!

Ironholds commented 8 years ago

(Just to make sure, obliterating the data directory, the pkg-config output and the libpostal source dir constitutes "nuke" right?)

albarrentine commented 8 years ago

Yeah, that should be it. Just reclone and bootstrap/configure/make/make install

Ironholds commented 8 years ago

Well, that's interesting. The error is now: "Error loading transliteration module". But the command line parser now works, which it didn't before. Wat?

albarrentine commented 8 years ago

Oh, weird. This is 64-bit arch I take it?

Ironholds commented 8 years ago

Yup!

Aaand now Rcpp can't find <libpostal/libpostal.h> relying on pkg-config even though running pkg-config absolutely finds it. R, what are you DOING?

Ironholds commented 8 years ago

Hmn. Okay, that may be because libpostal no longer lives where pkg-config thinks it does. I'm going to avoid running my mouth at a mile a minute and debug it ;p

Ironholds commented 8 years ago

After kicking, swearing and giving my machine the hairy eyeball it looks like it can now find where everything lives and your setup/teardown changes (with a few tweaks I need to push) made parsing work happily! Danke schoen!

As an upstream suggestion: it seems like there should be a better response to "you've asked for a parse but nobody did setup" than a segfault. Want me to open an issue? Does it sound silly?

albarrentine commented 8 years ago

:tada: no problem. It should definitely not segfault, should either log an error and return NULL or do an assert, which would be a more graceful crash that tells you what went wrong.

Ironholds commented 8 years ago

Want me to file then?

albarrentine commented 8 years ago

It's a 2-line change, I'll just add it in now.