derhuerst / nahsh-gtfs-rt-server

Expose Schleswig-Holstein & Hamburg transit data as a GTFS-RT feed.
Other
3 stars 1 forks source link

stop name normalisation: handle Nah.SH-specific cases #1

Open MTRNord opened 2 years ago

MTRNord commented 2 years ago

Hi :)

I used your https://github.com/derhuerst/nahsh-gtfs-rt-server early for some testing using the data listed at https://gtfs.mfdz.de/

In general it works but it also generates some to my untrained Eyes seemingly wrong tripIds when watching using https://public-transport.github.io/gtfs-rt-inspector/?feedSyncStopped=false&feedUrl=https%3A%2F%2Fnahsh.gtfs-rt.nordgedanken.dev%2Ffeed&view=inspector

Is that to be expected or is this a bug?

Feed is public at https://nahsh.gtfs-rt.nordgedanken.dev/feed and it seems to overall work despite this one thing.

derhuerst commented 2 years ago

I used your https://github.com/derhuerst/nahsh-gtfs-rt-server early for some testing [...]

Please report anything that is broken or that you find unintuitive or under-documented!

In general it works but it also generates some to my untrained Eyes seemingly wrong tripIds when watching using https://public-transport.github.io/gtfs-rt-inspector/?feedSyncStopped=false&feedUrl=https%3A%2F%2Fnahsh.gtfs-rt.nordgedanken.dev%2Ffeed&view=inspector

Please provide some specific examples, as the content of the feed changes quickly.

I just had a look, and there are lots of un-matched trips; Their trip_ids don't contain GTFS-based values, but the original trip IDs from HAFAS; These IDs usually contain several | chars.

derhuerst commented 2 years ago

Feed is public at https://nahsh.gtfs-rt.nordgedanken.dev/feed [...].

Would you like to host this public feed for others, and shall I add it to the list at transport.rest?

MTRNord commented 2 years ago

Feed is public at https://nahsh.gtfs-rt.nordgedanken.dev/feed [...].

Would you like to host this public feed for others, and shall I add it to the list at transport.rest?

* IMO hosting public GTFS-RT feeds could be a small but important puzzle to making public transport more modern and reliable, thus contributing a tiny bit to the shift towards sustainable mobility!

* It uses [`hafas-client`'s `nahsh` profile](https://github.com/public-transport/hafas-client/tree/5/p/nahsh) underneath, which uses authentication data obtained by inspecting an apps network traffic; But so far, I haven't had any legal problems whatsoever, using `hafas-client` with many endpoints and in a larger scale.

Sure that shouldnt be a problem :) I am running it anyway so there is no harm in adding another way to access it from :)

MTRNord commented 2 years ago

I just had a look, and there are lots of un-matched trips; Their trip_ids don't contain GTFS-based values, but the original trip IDs from HAFAS; These IDs usually contain several | chars.

Thats exactly what I was wondering about. I was also trying the https://gtfs.mfdz.de/NAH.SH.with_shapes.gtfs.zip version of the gtfs data. However that has some weird issues with the calendar_dates.txt making it impossible to import it without modification.

As for examples as you said its those looking like 1|34393|0|100|31102021

MTRNord commented 2 years ago

I am not sure if it is relevant but for a lot of those ids the stops.txt of the input gtfs does have a matching name. The trips.txt has some of them as well

MTRNord commented 2 years ago

Ok I tracked some of it down.

For example Hamburg data is written as Hamburg Bf. Altona in Hafas. In the inspector it shows up as Bf. Altona and in the GTFS it is Hamburg-Altona. And I am guessing this likely is happening for other Hamburg stations as well.

MTRNord commented 2 years ago

Actually in Hafas it exists both with and without the Hamburg part.

MTRNord commented 2 years ago

Also apparently the GTFS is not complete. I am finding more and more completely missing stops.

hbruch commented 2 years ago

Please try with https://gtfs.mfdz.de/NAH.SH.raw.gtfs.zip instead of with_shapes. The last one is a processed version with enhanced shapes and cleaned/minimized with gtfstidy. The raw dataset might not suffer from derhuerst/gtfs-via-postgres#16.

Nevertheless, there might be issues with the raw dataset. In case you find such issues, please report them here, or, even better, via https://github.com/mfdz/GTFS-Issues/issues

MTRNord commented 2 years ago

I will try that 👍

MTRNord commented 2 years ago

Please try with https://gtfs.mfdz.de/NAH.SH.raw.gtfs.zip instead of with_shapes.

After using that on a clean database and restarting it seems that this didnt fix the issue. It made some (Like Hamburg-Altona) work but now others that previously worked (Like Hamburg Hbf) stopped being matched correct.

image image image

MTRNord commented 2 years ago

It seems like that however some routes are matched correct for Hamburg Hbf

image

derhuerst commented 2 years ago

After changing the setup, e.g. the GTFS feed, be sure to also flush the cache via redis-cli flushall (or redis-cli -n <nr> flushdb with the appropriate DB nr), because it will contain the failed matching.

derhuerst commented 2 years ago

I'll be on my laptop in the afternoon and check if I can reproduce the cases where the matching doesn't work. I'll also look into https://github.com/derhuerst/gtfs-via-postgres/issues/16.

MTRNord commented 2 years ago

After changing the setup, e.g. the GTFS feed, be sure to also flush the cache via redis-cli flushall (or redis-cli -n <nr> flushdb with the appropriate DB nr), because it will contain the failed matching.

Ah ok yeah that I totally missed. I will do that

derhuerst commented 2 years ago

I've opened https://github.com/derhuerst/nahsh-gtfs-rt-server/issues/2 for all ops- and setup-related issues. Let's keep this Issue about the matching not working.

MTRNord commented 2 years ago

After changing the setup, e.g. the GTFS feed, be sure to also flush the cache via redis-cli flushall (or redis-cli -n <nr> flushdb with the appropriate DB nr), because it will contain the failed matching.

Ah ok yeah that I totally missed. I will do that

Just to follow up on this: After switching my routes to a source address that is not blocked by their firewall (I have no idea how I got that but it seems like my main server address is now blocked. Luckily I have a few other ips) and doing the redis flushall before reseting (happened earlier today) the same issues seem to still be present. So while that probably helped with other issues it seemingly did not fix the initial issue :)

derhuerst commented 2 years ago

@hbruch You worked on adapting the name normalisation logic to Nah.SH, right? Would you mind pushing your WIP changes as a PR, so that @MTRNord can check if they help?

@MTRNord Currently, the name normalisation logic is just copied from berlin-gtfs-rt-server, so it might do more damage than help with matching: https://github.com/derhuerst/nahsh-gtfs-rt-server/blob/10d971840a973e9733e8c328d847856e7ace35db/lib/normalize.js#L3-L16

MTRNord commented 2 years ago

@hbruch You worked on adapting the name normalisation logic to Nah.SH, right? Would you mind pushing your WIP changes as a PR, so that @MTRNord can check if they help?

@MTRNord Currently, the name normalisation logic is just copied from berlin-gtfs-rt-server, so it might do more damage than help with matching:

https://github.com/derhuerst/nahsh-gtfs-rt-server/blob/10d971840a973e9733e8c328d847856e7ace35db/lib/normalize.js#L3-L16

Well I was mostly doing some basic replace things like Hamburg Bf. prefixed things to Hamburg- but I am not sure if it is actually a) in the right place (I did it on name in normalizeStopName) or if it is actually a good thing to do :) So I didnt do much more than experimenting with varying results which seemed to not always help. Probably due to lack of fully understanding how the matching happens for this :)

MTRNord commented 2 years ago

All changes I did so far were along the lines of doing this:

const normalizeStopName = (name) => {
    const fixedName = name.replace("Hamburg Bf.", "Hamburg-").replace("Bf. Altona", "Hamburg-Altona")
    return tokenize(fixedName, { meta: 'remove' }).join('-')
}

So fairly basic changes.