Open defgsus opened 2 years ago
Dear @jklmnn,
adding the original scrapers is some work. It can take more than an hour for one city. However, it's progressing. I'm testing everything properly, replacing http with https and for the meta-infos i usually do a merge of scraped data and the original geojson files. E.g. in the Freiburg scraper (in get_lot_infos
) the original ParkAPI geojson is downloaded from github and combined with the geojson of the Freiburg server. (Once the new geojson file is written, the get_lot_infos
method is not called anymore and the code goes into an obsolete state - not needed until maybe a new lot appears within the pool. Though, once the geojson file is edited by hand this becomes more complicated..)
I also update addresses if the website supplies more complete addresses, like an added zip code. And add public or source urls for each lot where available.
Also i'm a bit more strict about the nodata
status, or rather the collected numbers. If there is no free spaces or capacity number that can be scraped the values are set to None
instead of zero.
For Dresden i scraped the geo coordinates from the website when available and used the ParkAPI geojson if no coords are listed. The website coordinates have more digits so i thought this might be a good thing. But i guess it's possible that you and other contributors have picked more useful coordinates by hand, so this needs to be reviewed (not only for Dresden).
Anyways, I do my best (to the best of my knowledge) to integrate the original scrapers and upgrade the meta-info where possible.
Also wrote the Frankfurt opendata people about their outage (It stopped working on 2021/12/17)
Boy, i'm really looking forward to get this project in production!
Best regards and a happy new fear
Great work!
Also i'm a bit more strict about the nodata status, or rather the collected numbers. If there is no free spaces or capacity number that can be scraped the values are set to None instead of zero.
This is generally a good idea. However I can't say for sure if we can keep this if it goes into production. It might cause problems with legacy clients.
However I can't say for sure if we can keep this if it goes into production. It might cause problems with legacy clients.
Yes, replacing Nones with zeros in v1 api should be no problem. In the dumps, snapshots with None can probably just be skipped.
There are incompatibilities with some lot_ids, though. And other tricky stuff ;) I'll implement the remaining scrapers and then do a scripted comparison with api.parkendd.de
Then, we certainly have some stuff to discuss and find compromises
The Frankfurt case: https://www.offenedaten.frankfurt.de/blog/aktualisierungverkehrsdaten
From the email: ... Sobald vom Hersteller ein entsprechender Sicherheitspatch eingespielt wurde,..
hehe
Okaaayyyhhh, here is the first attempt to compare api.parkendd.de
against ParkAPI2/api/v1
. I got used to call the former ParkAPI1 (or pa1) and the latter ParkAPI2.
https://github.com/defgsus/ParkAPI2/wiki/v1-api-comparison
Just compared the 'city' metadata, not the lots. It's complex enough already. You can have a look if you like. I'm still preparing a more readable document with specific compatibility issues.
One thing is sure. Using names for IDs will remain to be problematic. They do actually change occasionally.
Sorry for the late reply. The problem with the lot IDs is that not all sources have real IDs, so we need to keep some kind of fallback. In the end, if there is no unique persistent ID and the data source decides to change the name, there isn't really anything we can do. We could use the location of some sort though. This is based on the assumption that a parking lot can't easily relocate itself and if it does we can safely assume that it is a different one. This would also be useful if someone wants to use this data for analysis, since a different location might have implications to the traffic around the parking lot.
Yes, it's complicated with those IDs. I'm really just picky because of later statistics use. Your location idea sounds quite good in this regard.
For daily use it's probably no problem if a lot name changes. Apart from the fact that it is not associated to it's former .geojson entry anymore which, in ParkAPI2, would exclude it from the API v1, because it has no location and therefor no associated city.
With the right measures and follow-up maintenance this can be somewhat managed.
When porting your scrapers i found permanent IDs on some websites but with the current data structure it's not possible to switch to those IDs for identification while keeping the original IDs (from the lot names) for compatibility.
I found so many little compatibility challenges during the port that it felt like real work. Well, at least i spent a couple of real working hours ;)
In the midst of it i started writing the following overview. There are things i wanted to add later but i simply forgot them.
(no specific order, numbers are just for communication)
bus
" which should be excluded in api by default. Just for statistics..http
urls are changed to https
. Even scraped links to individual lot pages are adjusted if needed.www.<city>.de
. If possible, they got replaced by something like www.<city>.de/parken/
.
General url logic: if a Pool's public_url
is scraped, source_url
is left empty.public_url
to all lots that have an individual public webpage.live_capacity
. It simply means: If there is a capacity number on the website it will be scraped with every snapshot. If not, the static capacity from the .geojson file is used and live_capacity
should be False to signal that the capacity number is static and might not reflect the true capacity at any point in time.Hamburg
: The original lot_ids were plain numbers. Added a hamburg-
prefix. (We can switch back to the original IDs though, if needed)Karlsruhe
: Now scraping one webpage for every lot. (K04, S07, ...). That way we can read the true update-timestamp and live capacityKöln
: There were so many lot name changes that the scraper is using the IDs now. They look like koeln-d-p001
, koeln-ph03
, ... (see here)Magdeburg
: Removed lots "Endstelle Diesdorf", "Milchhof", "Lange Lake" (from second table here) because they do not list number of free spaces (only capacity).Münster
: Explicitly flagged the "Busparkplatz" as lot_type bus
which will remove it from default api responses.Nürnberg
: "Tiefgarage Findelgasse" changed to "Parkhaus Findelgasse". So lot_id changed as well. Should check if that is really what happened and maybe relocate the entrance.Wiesbaden
: One web request per lot. They changed the website and the only way i found is to request individual geoportal urls.That's it for now.
Please let me know what you think and let us progress, slowly..
I just checked the available cities after our current outage, and the only city I can see missing is Hanau. So after we add this I'd say we can close this issue.
All lots need to have the same ID as it was generated in the original ParkAPI by the geojson wrapper. (As discussed in issue #1)
In essence that means:
utils/strings/name_to_legacy_id
to convert the name stringsutils/strings/name_to_id
which allows-
separatorsbranch:
feature/legacy-ids