derhuerst / db-stations

A list of Deutsche Bahn stations.
ISC License
29 stars 4 forks source link

Some station miss stationNr #4

Closed lightsprint09 closed 7 years ago

lightsprint09 commented 7 years ago

The following station is missing a nrattribute. When I do a lookup here by searching for *Jungfernheide* I get a station with EVAID 8089100 and a stationNr.

This seams to be a general problem for stations related to S-Bahn

{ id: 8089100,
  ds100: 'BJUN',
  name: 'Berlin Jungfernheide (S)',
  latitude: 52.530375,
  longitude: 13.299446 }
derhuerst commented 7 years ago

Currently, the build script pulls data from the static lists of stations.

I will investigate the stations API soon.

derhuerst commented 7 years ago

Unfortunately, the API claims to have 5363 stations, whereas the static data set has 6599 stations.

@highsource ? 😉

derhuerst commented 7 years ago

FYI I started working on a build script that pulls from the API. Having an API limit higher than 10 requests / 10 min would be fantastic @lightsprint09 @highsource ! 😜

highsource commented 7 years ago

I appreciate that you think I have intimate knowledge of every API and dataset. :)

The latest dataset has 5372 stations which is close to what the API says. For the difference in 9 stations - either the datasets are async or there are some edge cases.

This dataset has 6598 entries but some are repeated - in case there's two or more RIL100 per station (often the case with combined S-Bahn/long distance stations). Köln Messe/Deutz has 3 RIL100-codes (KKDZ, KKDZB, KKDT). Same for Berlin Jungfernheide (BJUF, BJUN). The API still considers it to be just one station, returns two RIL100-codes.

The limit of 10 queries per minute - please write to dbopendata@deutschebahn.com, I think this should not be a problem to increase the limit.

highsource commented 7 years ago

ps. But generally I'd recommend building scripts so that they retry-with-pause (growing exponentially) in case of exceeded limit.

derhuerst commented 7 years ago

This dataset has 6598 entries but some are repeated - in case there's two or more RIL100 per station (often the case with combined S-Bahn/long distance stations). Köln Messe/Deutz has 3 RIL100-codes (KKDZ, KKDZB, KKDT). Same for Berlin Jungfernheide (BJUF, BJUN). The API still considers it to be just one station, returns two RIL100-codes.

That clears up things, thanks!

The limit of 10 queries per minute - please write to dbopendata@deutschebahn.com, I think this should not be a problem to increase the limit.

Will do! It's uncomfortable that this is the defautl however. It's about making access to data as convenient as possible, and a pretty strict limit of 10/min does't help with that.

ps. But generally I'd recommend building scripts so that they retry-with-pause (growing exponentially) in case of exceeded limit.

I did, i built in very cheap throttling, but it makes the build take ~~40min~ 4min instead of seconds.

derhuerst commented 7 years ago

FYI a case where there's less information. This is from the current db-stations, pulling data from the two static datasets:

{
    "ds100": "DPUS",
    "nr": 8294,
    "name": "Pulsnitz Süd",
    "zip": "01896",
    "city": "Pulsnitz",
    "state": "SN",
    "id": 8012686,
    "latitude": 51.181002,
    "longitude": 14.007501
}

In the API, the relevant IBNR is missing:

{
    "number": 8294,
    "name": "Pulsnitz Süd",
    "mailingAddress": {
        "city": "Pulsnitz",
        "zipcode": "01896",
        "street": "Dresdner Straße 11a"
    },
    "category": 7,
    "hasParking": true,
    "hasBicycleParking": false,
    "hasLocalPublicTransport": true,
    "hasPublicFacilities": false,
    "hasLockerSystem": false,
    "hasTaxiRank": false,
    "hasTravelNecessities": false,
    "hasSteplessAccess": "yes",
    "hasMobilityService": "no",
    "federalState": "Sachsen",
    "regionalbereich": {
        "number": 2,
        "name": "RB Südost",
        "shortName": "RB SO"
    },
    "aufgabentraeger": {
        "shortName": "Verkehrsbund Oberelbe GmbH",
        "name": "VVO"
    },
    "szentrale": {
        "number": 53,
        "publicPhoneNumber": "0351/4611055",
        "name": "Dresden"
    },
    "stationManagement": {
        "number": 118,
        "name": "Dresden"
    },
    "evaNumbers": [],
    "ril100Identifiers": []
}
derhuerst commented 7 years ago

See also the build output.

lightsprint09 commented 7 years ago

Would merging help to build a even better data source

derhuerst commented 7 years ago

@lightsprint09 Yes! Please try to merge the two static datasets into the API. 👍 This way, the improved data source will be available to everyone.

derhuerst commented 7 years ago

I published db-stations@1.0.0 which contains data in the Friendly Public Transport Format, so you will unfortunately have to adapt your consuming scripts/libs. It contains every station from the API, except those without IBNRs.