Closed dantonnoriega closed 7 years ago
Hi there,
Just adding a note to say that I've replicated this issue in its entirety - I've done all the same checks, and normalise_addr works but not parse_addr.
Cheers!
Weird! Thanks; will be looking into it over the next couple of days.
On Tuesday, 20 September 2016, Luke Holman notifications@github.com wrote:
Hi there,
Just adding a note to say that I've replicated this issue in its entirety
- I've done all the same checks, and normalise_addr works but not parse_addr.
Cheers!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Ironholds/poster/issues/5#issuecomment-248469212, or mute the thread https://github.com/notifications/unsubscribe-auth/ACXz3naLOY0t-tOiAdKk83OUBBNKl7noks5qsG_cgaJpZM4J3g7O .
Welp, looking into it now and I...can't get libpostal to install on my macbook :|
Looking through the code I can basically think of two possible sources of the bug; either (1) libpostal is returning empty strings (in which case they get turned into NAs) or (2) the API changed and so it isn't actually fetching any of the entries (and is thus returning NAs). I'll be able to eliminate one or both possibilities once I've got it installed (I assume the CLI works fine?)
Yeah, on my computer the CLI works fine (I guess you mean the interactive shell for the parser that you get by typing .address_parser or whatever it is into Terminal). But maybe it returns the address elements in a format that the R-C interface doesn't recognise? Seems like the R library has some internal functions that get the elements (house, city, etc) of the address from the parser output, and pass them to the R parser function, and these might not be working ok, but I couldn't figure it out.
Cheers
Yeah - say, could you paste some example output for the interactive shell?
Sure, here you go!
> 18 Smith Road, London, UK
Result:
{
"house_number": "18",
"road": "smith road",
"city": "london",
"country": "uk"
}
> 1600 Amphitheatre Parkway Mountain View, CA 94043 Phone: +1 650-253-0000
Result:
{
"house_number": "1600",
"road": "amphitheatre parkway",
"city": "mountain view",
"state": "ca",
"postcode": "94043",
"house": "phone +1 650-253-0000"
}
Thiiis is super weird. The only explanation I have right now is that the function that actually does the retrieval is...somehow not being called? Which, what the fuck?
Somedays R just doesn't want to cooperate! Is it possible to call the internal 'get elements' function (I forget its exact name) on its own and test that? I couldn't get it to work (it actually crashes RStudio and gives the little bomb icon when run :) ).
I can't see any Rcpp changes that would've caused this to suddenly fall over. @thatdatabaseguy did you make any API changes or whatnot in the last release? (I don't see a changelog or I'd check that, and I know you were doing a big rewrite)
I can probably export that function and see what'll happen, yeah - that's my next port of call! How did you export it that it blew up?
Okay, I called it internally, it doesn't blow up, so the libpostal side of things is working so what in the hell is going on?!
Actually if you called the getters and setters they would've crashed on account of not working (hence the lack of export)
Wait, I know what it bloody is. The last Rcpp release makes get_cstring() vanish. This was, of course, not documented.
Well, that's a problem but it doesn't fix it. hrrn.
Okay, I should've fixed it. If someone else seeing the bug can replicate the fix on the latest commit, I'd be most grateful. I still have no idea what caused it.
> library(poster)
>
> str(normalise_addr("781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA"))
chr "781 franklin avenue crown heights brooklyn nyc new york 11216 usa"
> str(poster::parse_addr("781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA"))
'data.frame': 1 obs. of 10 variables:
$ house : chr NA
$ house_number : chr "781"
$ road : chr "franklin ave"
$ suburb : chr "crown heights"
$ city_district : chr "brooklyn"
$ city : chr "nyc"
$ state_district: chr NA
$ state : chr "ny"
$ postal_code : chr NA
$ country : chr "usa"
Quick rerun and seems to work better, but a little disconcerting that postal_code
doesn't seem to work (I would think that would be one of the easier ones for the wee algorithm).
Ah, R binding was using the wrong field name (the C library calls it "postcode"). Pull latest and that address should parse to the same result as in the command-line client output above.
Nice!
> poster::parse_addr("781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA")
house house_number road suburb city_district city
1 <NA> 781 franklin ave crown heights brooklyn nyc
state_district state postal_code country
1 <NA> ny 11216 usa
Nicely done, Al! Any idea why the code stopped working randomly? It's the weirdest thing - if you nest the call 4 functions deep, it never runs. 3 functions deep, it does. wut.
Like if you call foo, which calls bar, which calls blah which calls poster::parse_addr, it returns NAs?
More precisely it simply doesn't appear to run; it returns NAs because the output is preallocated with NAs a call above.
On Sunday, 25 September 2016, Al Barrentine notifications@github.com wrote:
Like if you call foo, which calls bar, which calls blah which calls poster::parse_addr, it returns NAs?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ironholds/poster/issues/5#issuecomment-249431587, or mute the thread https://github.com/notifications/unsubscribe-auth/ACXz3gtW2mR1LG5oOuX2o59MN7dg7Inrks5qtqLSgaJpZM4J3g7O .
An example would be helpful. If the inputs to parse_addr are NA, it will do nothing, but allocation of the outputs happens within the function's scope so ought to be independent of any previous calls.
Hi all, I just got around to installing the new version and testing it out. Works great for me too! Thanks for the help - now to test it on a big mass of data!
To start off, the
normalise_addr
works fine.Its
parse_addr
that is funky.Anyhow, sad it isn't working. Not the end of the word as I think the normalizer will suffice. Just thought I'd inform!
Here is what I've done already:
Rcpp
(well, already had it install, but reinstalled it).libpostal
installation instructions and had a successfulmake check
and
Here is maybe relevant computer info: