FIRSTMap / firstmap.github.io

:earth_americas: An interactive online map of FRC teams.
https://firstmap.github.io
MIT License
26 stars 54 forks source link

Incorrect Team Locations #80

Closed PChild closed 6 years ago

PChild commented 7 years ago

There are three teams currently on the map in Russia that are actually in Vietnam, Poland, and China.

GeeII commented 6 years ago

Those are just the tip of the problem. Most of the teams and events shown in Europe are actually in other parts of the world. Mostly middle to far east, but Northern Lights (shown near Bayonne, France) is in Minnesota. Are there plans to upgrade the scraper to fix these, or should I go in and do manual overrides?

ErikBoesen commented 6 years ago

Uhhh. I have no idea why that would be happening.

@ThePlasmaGuy, any idea what could be wrong with the scraper?

GeeII commented 6 years ago

I also see three depicted in South America which aren't (6623 Morocco, 6170 Mexico, 6415 Turkey). Not just high numbers, though - 1580 (Israel) is depicted near Versaille, France.

ThePlasmaGuy commented 6 years ago

Not completely sure, could be some issue with city to geo-position conversion. I can try to take a look at the scraper software in a little bit and see if I can figure it out. I know that TBA's API v3 had to remove coordinates from their returns due to innacurate returns such as this though.

GeeII commented 6 years ago

Two of the outstanding issues (Northern Lights... and ONT District...) seem to be special cases of this one. Can these issues be merged?

ErikBoesen commented 6 years ago

Are you referring to #68?

GeeII commented 6 years ago

Yes, #68 and #66.

GeeII commented 6 years ago

@ThePlasmaGuy, are you still working this? If not, let me know and I'll pick it up. It's been a few years since I've done any web scraping, so I won't be too fast at it, but it seems like a good project.

ErikBoesen commented 6 years ago

@GeeII I don't think he's working on it anymore.

ErikBoesen commented 6 years ago

Also, there are a bunch of teams shown in the Philippines which are actually in Israel.

ThePlasmaGuy commented 6 years ago

I actually have been working on this (just haven't gotten thread notifications until today?) From what I'm finding, the issue seems to be with geolocation services (esp. Google) not being able to map locarion strings to addresses correctly. That's the main reason coordinates were pulled from tba API v3

ThePlasmaGuy commented 6 years ago

So while the data scrape might not be able to be fixed (I'm still working on making it better at least) I do have a WIP fix to the data structure to make it easier to manually override.

ErikBoesen commented 6 years ago

Thanks, that will be helpful in the short run.

ErikBoesen commented 6 years ago

Several Chinese teams are in South Africa, as well.

ThePlasmaGuy commented 6 years ago

Unfortunately geocoordinates are just hard when it comes to converting addresses to them. I'll try to throw something in there to check if the country is right and at least move it to the right country and flag it if it's not. But there will probably have to be a bunch of manual stuff just because of the way addresses are processed by these APIs.

GeeII commented 6 years ago

What geolocator are you using? When I put the city and state from the Blue Alliance API (erikboesen:frcmap:v1.0) into google maps, I get a good spot on the map (or at least a lot better), and I know the ESRI geolocator is also pretty good. (Though I've never used either directly through an API.)

ErikBoesen commented 6 years ago

We were using the Google Maps geolocator. The TBA one didn't exist when I started this project.

ErikBoesen commented 6 years ago

@ThePlasmaGuy just left on a Mormon mission. I don't think he'll be working on this for a little while.

GeeII commented 6 years ago

OK. It's rather late tonight, but I'll start into this over the weekend.

GeeII commented 6 years ago

I didn't get to it over the weekend, but I have been working on it this afternoon. I can't get google to give me wrong-continent responses, but it does seem to get confused with too much information. Outside of the US (though I'll add in more countries with good codes), leaving out the postal code seems to improve the results. I've also worked through your sub-API to get team data from TBA. It looks like I can get from the team number to the place name to the lat/lon. Do you get the list of current team numbers from TBA? What's the form of the URL for this? Do you have the API key for Google Maps (I've been doing sample points with the GUI, though I can do a wget and automatically find the lat/lon data from the 1/2 Mb or so download).

GeeII commented 6 years ago

If you don't have the Google Maps key, I can get another key from Mapquest, which seems to be just as good, at least at this level of resolution.

ErikBoesen commented 6 years ago

The URL format for getting team lists is as follows: http://www.thebluealliance.com/api/v3/teams/page_number?X-TBA-Auth-Key=your_auth_key

I don't have a key, instead of purchasing one I just ratelimited my requests. Plasma's translation to Python might have neglected this feature.

GeeII commented 6 years ago

Thanks, and fair enough; I was also thinking of that. What rate did you do the google map requests?

ErikBoesen commented 6 years ago

I honestly don't remember. I think I limited it to one every 2 seconds, but that could be wrong.

GeeII commented 6 years ago

Plenty fast enough, should still just take a very few hours to get everybody, faster if I cache duplicate locations.

Also, in looking at the API docs, I just noticed that TBA v2 is going to be deprecated at the end of the calendar year. The "popup" for each team (and maybe other items) is based on a real-time TBA v2 API snag.

GeeII commented 6 years ago

As I make this scrape, I'll save the data for the teams and regionals in case the v3 API won't work for people without their own key.

ErikBoesen commented 6 years ago

Yeah, we should fix that as well. Won't be too difficult at all.

GeeII commented 6 years ago

I've pulled all the team data from v3 and have it mostly orthogonal; a lot harder than it ought to be pulling from an API. Hopefully, the events won't be as bad, or will work with the same/similar rules.

GeeII commented 6 years ago

I think I've found the issue. Many of the international teams have non-ascii unicode characters in their address information; mostly letters with diacritical marks. I am mapping the characters from \u0080 to \00ff to 8-bit ascii, which should work OK. There are three others in there, \u0130, \u0131, and \u015f which can be mapped to I, i, and s and should work fine at least with the geocode lookup. I also plan to translate these characters for the other fields as well.

ErikBoesen commented 6 years ago

Makes sense. Thank you for your help with this.

GeeII commented 6 years ago

Running the teams through google maps now. It turns out the unicode is a much more prevalent issue than I realized on the other fields, so I'm only making the correction at this point to the address fields, and only for the call to google maps. I am saving the stuff from TBA (throttling down to just the 2018 home championship to make it a vector). I am also cutting off the coordinates at three decimal points, which is about 111m or 370 feet, which is already more precise than a 5 digit zip code. I'm just doing city, state_prov postal code, country for now. Looking for school/last sponsor locations sounds like a good project, but it'll take a lot longer, so I'm focusing on this issue for now.

GeeII commented 6 years ago

gack. after about 18 or so queries, it came up with service unavailable. When I went to google maps on firefox, it put me through a mouse pointer CAPTCHA.

GeeII commented 6 years ago

I downloaded a few files from geocities, and have all but about 50 teams working automatically from that dump. I think I can put in fixes for a few others (e.g. "New York, New York, USA" is not the same as "New York City, New York, USA". I'm not far from giving up and sending the remainder to google maps.

Haven't done the events, but as this is the big data set, should have most of the problems licked.

GeeII commented 6 years ago

I got it down to 26, left for google and ran the whole batch. I'll be away this afternoon, but I'll QC these this evening and hope to upload an updated teams.js.

GeeII commented 6 years ago

Placed a pull request for teams.js. 128 teams moved more than 10 miles. A good sampling of those teams moved for the better.

GeeII commented 6 years ago

The last 26 teams that I went to google with, I swapped lats and lons. This has been fixed. AFAIK, this should be good for teams. I'll follow up with events later this week.

GeeII commented 6 years ago

I downloaded the events from TBA and compared to the existing file. I noticed that district CMPs are listed as district events, but world CMPs are not on the map. Do you want me to get these as well as a new category? If so, should I break DCMPs into districts, CMPs, or a fourth category? It's easy enough to do any way from my side.

ErikBoesen commented 6 years ago

I think adding a new category would be the best option.