Closed aaronkrusniak closed 4 months ago
This seems to be an issue with either your geocoder or custom json parsing. I cannot repro on the world geocoder. I'll keep looking!
No errors or warnings with the following:
library(arcgis)
library(arcgisgeocode)
library(dplyr)
library(tibble)
set_arc_token(auth_user())
# Some dummy data:
music_venues <- tribble(
~Name, ~Address,
"Aragon Ballroom", "1106 W. Lawrence Ave.",
"House of Blues", "329 N. Dearborn St.",
"Bottom Lounge", "1375 W. Lake St.",
"The Vic", "3145 N. Sheffield Ave.",
"Park West", "322 W. Armitage Ave.",
"Thalia Hall", "1807 S. Allport St.",
"Lincoln Hall", "2424 N. Lincoln Ave.",
"Schubas Tavern", "3159 N. Southport Ave."
)
# Just under max batch size (992 rows):
short <- music_venues |> slice(rep(1:n(), each = 124))
long <- sample_n(music_venues, 1008, replace = TRUE)
res_long <- geocode_addresses(long$Address)
mega <- sample_n(music_venues, 4000, replace = TRUE)
res_mega <- geocode_addresses(mega$Address)
res_mega
Darn— that does make sense though! Thanks for checking, I'll dig into our internal geocoder from my end and see if I find anything useful.
Ah, I see where it's going wrong.
In parsing custom json we need to pre-allocate vectors. I am pre-allocating based on n
(the total number of features) as opposed to the chunk size itself! Instead of passing n
I need to be passing the size of the chunk
Another issue I think that we're encountering is that there might actually be an error in the JSON but since w'er parsing any json that comes our way, we're not actually capturing the fact that an error is occurring
@aaronkrusniak Do you have rust available on your machine? If so, could you test by installing this branch: https://github.com/R-ArcGIS/arcgisgeocode/tree/custom_loc_batches ?
I do not, but this is a great excuse for me to try to get it; I've been meaning to start dipping a toe in rust. If I'm able to set it up on my organization pc I'll give it a shot and let you know!
Alright, looks like I'll be able to set Rust up but it's going to require IT approval, which usually takes 2-7 days at my org. If there's a faster way you'd like me to try to get you feedback, let me know @JosiahParry!
Sounds good! I can merge to main and revert it if we need to as well. Nbd
I've merged the branch with main and bumped the version. There should be a new r universe build shortly !
Just updated and it's working as expected now, thanks!
@aaronkrusniak wooooot!!! Keep the feedback coming. This is very helpful
Did some more stress testing today and came across this bug— it appears that jobs exceeding the max batch size for a geocoder will fail in one of two ways:
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: {max batch size}, {number of rows provided}
null
. E.g. if max batch size = 1000 and you provide a job with 4000 records, you will receive a data frame of 16000 empty records.Here's an illustration for this one:
Additionally, sometimes I'm getting that
Warning message: In data.frame(... check.names=FALSE): row names were found from a short variable and have been discarded
on any job, even ones that are smaller than the max batch size. Whenever this happens, the whole data frame returns with empty results. I can't seem to create a reprex for that particular issue; I've noticed that resettingarc.check_portal()
andset_arc_token(auth_binding())
seems to resolve it, so maybe my portal authorization is just timing out? But sometimes it seems like I can go 20 minutes without running into a problem, and other times it seems like I can't even go 2 minutes before it happens. Not sure what's going on, but happy to drop an issue in thearcgisbinding
repo if it ends up being better suited there.Thanks!