elad661 / curlbus

Terminal based UI for Israeli public transit data
41 stars 7 forks source link

Several improvements ideas #10

Open chubin opened 6 years ago

chubin commented 6 years ago

Hi Elad! wttr.in creator here.

I was really glad to find your beautiful service.

I tried several queries but many of them seem to be broken:

$ curl https://curlbus.app/citypass
(after several minutes)
^C

Or this:

$ curl https://curlbus.app/galim
(after several minutes)
^C

What I would suggest to do:

  1. create a help page (/:help)
  2. create a list page for each section (/citypass/:list), to get the list of all available items
  3. search of nearest bus stops (you can use "wttr.in" resolving mechanism for that, if you want).

The service seems to be very cool and promising!! I like it a lot. If you need my help, I'll be happy to help

elad661 commented 6 years ago

Hi!

I don't see any brokenness on my end, both these queries worked for me from my local machine and from a DigitalOcean instance in Amsterdam, and this makes me suspect there's some overzealous firewall somewhere blocking your connection. I don't even see your requests in the logs.

This host I'm using is not ideal, and it's located in Israel which means not great connectivity to the outside world, but because I have to have the IP in a government whitelist to access the API, it's easier to keep using it instead of finding a better solution...

Regarding the ideas/suggestions: I already planned to do something similar to what you suggested (see also #8 for more ideas, the search bit was mentioned there for example)

Thanks :)

chubin commented 6 years ago

Now it works, though slowly, but it works now.

May I use your data, by the way, if I will create something of the sort? I will not focus on Israel, but it would be of course better if I could deliver info about Israel routes too (of course I will reference curlbus as the source of the data).

elad661 commented 6 years ago

You can use curlbus as an API for Israeli public transit data - use the Accept: application/json header to get the data as json.

But before you dive into it let me warn you - while buses and trains seem like simple things, there's a whole world of complexity and different features offered by different APIs and different operators around the world (including the exact definition of a bus stop, how the schedule is handled, API limitation, and what constitutes a bus line), and making a service that works for multiple countries means abstracting all those differences - and that's not an easy task.

It's much easier to build something that works for your country/city, where you can understand the local data, local usage patterns and local quirks - and such service would possibly be more useful than trying to abstract all the different data sources (I've started writing such abstraction library in the past and gave up when I realized the complex abstraction would make it pretty much useless).

Also, the data is big - the schedule file for Israel alone (which is a tiny country with a transit network roughly the same scale as the network in one European city) is 1GB uncompressed.

If you are going to do this, you might be interested in some data/documentation I collected when I tried to write that abstraction layer few years ago, which I pasted here verbatim (so sorry for the weird phrasing, I wrote this mostly for myself) from that discarded project documentation:


Data sources

Name Area Served APIs Supported Documentation Notes
MTA New York City SIRI (buses), GTFS-Realtime (subway), OneBusAway SIRI, GTFS-Realtime Requires API key for SIRI, and a different API key for GTFS-Realtime
MBTA Massachussetts Bay GTFS-Realtime, MBTA-realtime documentation Proprietary API in addition to GTFS-Realtime
Israel's MOT Israel SIRI documentation (Hebrew) Requires API key for SIRI. Documentation page has detailed SIRI documentation in Hebrew, as well as example requests and responses
TFL London TFL Unified API documentation Proprietary API
BART San Fransisco Bay Area BART API documentation Proprietary API
(various) The Netherlands GTFS-Realtime, KV8Turbo KV8Turbo, GTFS-Realtime Proprietary API with documention in Dutch only, but also GTFS-Realtime feeds converted by a 3rd party service.

SIRI (Service Interface for Real Time Information)

SIRI is based on SOAP (which means XML), and that's ugly. Various operators seem to have implemented JSON-based variations of SIRI, but it's unclear yet if those are standard, as the official SIRI standards specifies XML and does not specify the JSON structure.

Users

Links for transit-agency specific documentation

Notes

MTA and Israel's MOT implements slightly different versions of the protocol. The Israeli documentation is very unclear, so it's hard to see exactly what they changed. It seems that the MTA implementation implements a slightly newer protocol version, but they also support the older version - and the Israeli API is based on the older version.

Another key difference between the two implementations is that MTA allow the request to be sent using GET parameters, but Israel's MOT wants them to be sent as an XML with HTTP POST.

MTA allows JSON output, but Israel does not (but they will in the future, however they did not indicate when that is going to happen).

For abstracting this, we either want to use two completely different classes (and then we can use JSON for MTA, which is easier) or we could create a class named SIRIResponse and use it for both, while doing the request differently. It's also unclear if we want to implement SIRI support for MTA at all, it depends on which of their API is more reliable and easier to implement.

GTFS-Realtime

Originally developed by google, closely coupled with static GTFS.

Users

elad661 commented 6 years ago

Note the documentation I posted here is about real-time data. Usually you'd want to combine the real time data with the static schedule/map data, and that (usually) is served in a GTFS format, so you'd need to read up about that too.

chubin commented 6 years ago

Wow, it is really cool. You did so much work! It is impressive. As far as I understand, GTFS could be a really good starting point, and probably in many cases, it could be even enough. Or is it too naive from my side to think in such fashion?

elad661 commented 6 years ago

GTFS is okay if you're only interested in static schedules/routes/stops. In this case, it'd be "enough". But in the real world buses are often late, trains get canceled, etc so the usability of the static schedule data is very limited for this kind of service. With GTFS alone you'd be able to show when the bus is "supposed" to be at the station, but you have no way of showing when it'll actually be there - that's what realtime data is for.

And if it's a route you usually take, you probably already know when your bus leaves / supposed to leave.

If not, you probably want more complex routing, to see all the routes to your destination + estimated time each route will take, and such - and this doesn't sound like the kind of data that would fit in a terminal UI (plus the routing algorithms are really complicated, because there's a lot of stuff you need to take into account).

If you want to use GTFS (and don't need asyncio support) gtfslib-python is a pretty good library, Google has extensive documentation for the format itself and you'll be able to find the feeds on https://transitfeeds.com or https://transit.land/

Importing the feeds to the database takes a while when you do it via a python process, so I suggest importing them directly to the database like I do in curlbus and not use gtfslib's own import process, which takes few hours for a feed the size of the Israeli one.

If you are going to build something from all of this, I would suggest start small - just one or two transit agencies - and see if you can grow from there.