Closed lightsprint09 closed 7 years ago
Currently, the build script pulls data from the static lists of stations.
I will investigate the stations API soon.
Unfortunately, the API claims to have 5363
stations, whereas the static data set has 6599
stations.
@highsource ? 😉
FYI I started working on a build script that pulls from the API. Having an API limit higher than 10 requests / 10 min would be fantastic @lightsprint09 @highsource ! 😜
I appreciate that you think I have intimate knowledge of every API and dataset. :)
The latest dataset has 5372 stations which is close to what the API says. For the difference in 9 stations - either the datasets are async or there are some edge cases.
This dataset has 6598 entries but some are repeated - in case there's two or more RIL100 per station (often the case with combined S-Bahn/long distance stations). Köln Messe/Deutz has 3 RIL100-codes (KKDZ, KKDZB, KKDT). Same for Berlin Jungfernheide (BJUF, BJUN). The API still considers it to be just one station, returns two RIL100-codes.
The limit of 10 queries per minute - please write to dbopendata@deutschebahn.com, I think this should not be a problem to increase the limit.
ps. But generally I'd recommend building scripts so that they retry-with-pause (growing exponentially) in case of exceeded limit.
This dataset has 6598 entries but some are repeated - in case there's two or more RIL100 per station (often the case with combined S-Bahn/long distance stations). Köln Messe/Deutz has 3 RIL100-codes (KKDZ, KKDZB, KKDT). Same for Berlin Jungfernheide (BJUF, BJUN). The API still considers it to be just one station, returns two RIL100-codes.
That clears up things, thanks!
The limit of 10 queries per minute - please write to dbopendata@deutschebahn.com, I think this should not be a problem to increase the limit.
Will do! It's uncomfortable that this is the defautl however. It's about making access to data as convenient as possible, and a pretty strict limit of 10/min does't help with that.
ps. But generally I'd recommend building scripts so that they retry-with-pause (growing exponentially) in case of exceeded limit.
I did, i built in very cheap throttling, but it makes the build take ~~40min~ 4min instead of seconds.
FYI a case where there's less information. This is from the current db-stations
, pulling data from the two static datasets:
{
"ds100": "DPUS",
"nr": 8294,
"name": "Pulsnitz Süd",
"zip": "01896",
"city": "Pulsnitz",
"state": "SN",
"id": 8012686,
"latitude": 51.181002,
"longitude": 14.007501
}
In the API, the relevant IBNR is missing:
{
"number": 8294,
"name": "Pulsnitz Süd",
"mailingAddress": {
"city": "Pulsnitz",
"zipcode": "01896",
"street": "Dresdner Straße 11a"
},
"category": 7,
"hasParking": true,
"hasBicycleParking": false,
"hasLocalPublicTransport": true,
"hasPublicFacilities": false,
"hasLockerSystem": false,
"hasTaxiRank": false,
"hasTravelNecessities": false,
"hasSteplessAccess": "yes",
"hasMobilityService": "no",
"federalState": "Sachsen",
"regionalbereich": {
"number": 2,
"name": "RB Südost",
"shortName": "RB SO"
},
"aufgabentraeger": {
"shortName": "Verkehrsbund Oberelbe GmbH",
"name": "VVO"
},
"szentrale": {
"number": 53,
"publicPhoneNumber": "0351/4611055",
"name": "Dresden"
},
"stationManagement": {
"number": 118,
"name": "Dresden"
},
"evaNumbers": [],
"ril100Identifiers": []
}
See also the build output.
Would merging help to build a even better data source
@lightsprint09 Yes! Please try to merge the two static datasets into the API. 👍 This way, the improved data source will be available to everyone.
I published db-stations@1.0.0
which contains data in the Friendly Public Transport Format, so you will unfortunately have to adapt your consuming scripts/libs. It contains every station from the API, except those without IBNRs.
The following station is missing a
nr
attribute. When I do a lookup here by searching for*Jungfernheide*
I get a station with EVAID 8089100 and a stationNr.This seams to be a general problem for stations related to S-Bahn