feat: fetch-feeds sub command and other changes

Ananya2001-an commented 1 year ago

So I am laying out the changes that we would probably want to see next:

list-feeds sub command to have 3 options (--bbox, --predicate and --pretty)

by default we will show all the feed sources if no bbox input is given
we might have to add a dependency for the predicate option on the bbox option since there's no use of it without bbox values given
the pretty option for printing out the pretty table else we would just show the feed sources as a list ([nils] it should be output to stdout either separated by some character, e.g. space-separated, then we could even easily loop over the output in bash)
now these feed sources should have a way to pipe them as input for the fetch feeds command; probably something like being able to select them ig
[nils] we might have to do some (fuzzy?) text-matching when people want to search for e.g. *berlin* when they want to get the exact name of a feed, but don't know what a feed is called exactly. I think fuzzy matching is over the top, a simple in would suffice IMO. needs to be a command line arg like --search or so. or even support regex, or both regex and simple text search. wdyt @Ananya2001-an ?

fetch-feeds sub command to have one option i.e --sources

we will pass it as a string separated by commas
we can pass the source names like ['Berlin', ...] same names as the modules inside feed sources dir
also following the fetch code here: https://github.com/azavea/gtfs-feed-fetcher/blob/develop/FeedSource.py we are currently downloading the "latest" gtfs feed and then printing all info regarding its status inside a table which is nice but are we going to do something else along with that? I guess not since we just need the downloaded feed for the valhalla build instance later on….we would need to create an empty folder for the storing of those downloads ([nils] that should be a command line arg e.g. --output-dir)…like in the azavea repo they had "gtfs"
The check_status and extend_effective_dates utils are good for the downloaded feeds
[nils] we should multi-thread at the very least the downloading via a command line arg e.g. --concurrency for a pool of threads. here we can use threading as it's mostly network I/O which works quite well with python's threading (actual processing wouldn't because of the GIL, which kinda makes sure python can only run one thread at a time)

I guess the code written already is pretty good for the downloading and stuff so we will mostly copy paste but might have to still make some modifications as needed…feel free to add on to the list :)

nilsnolde commented 1 year ago

Thanks, I added some suggestions to your proposal, also inline, marked with [nils].

we might have to add a dependency for the predicate option on the bbox option since there's no use of it without bbox values given

That's a very good point! Would be good to add a dependency there.

The check_status and extend_effective_dates utils are good for the downloaded feeds

What does check_status do exactly?

I guess the code written already is pretty good for the downloading and stuff so we will mostly copy paste

That's totally fine IMO if the code is good.

Ananya2001-an commented 1 year ago

Thanks for the extra points I will keep that in mind....

What does check_status do exactly?

After downloading the feeds inside the output dir we can execute this file to get the status of the feeds; basically if we want to recheck...

That's totally fine IMO if the code is good.

yup :) but definitely we do need to add some more things like multi threading as u mentioned and other stuff as well....great then, I guess I can start working on it now and I will discuss things here only as we move ahead

nilsnolde commented 1 year ago

After downloading the feeds inside the output dir we can execute this file to get the status of the feeds

ok but what is status?😅 that’s a pretty generic word. Can’t be download status. Is it just printing the schedule info or so?

Ananya2001-an commented 1 year ago

status as in whether the feed is new, valid and in how many days will it get expired and stuff basically the same stuff that we show on the console when the feeds were downloaded with the fetch command for the first time here: https://github.com/azavea/gtfs-feed-fetcher/blob/6659a57fd02421f99a7fe4e01037257a80f64a4b/fetch_feeds.py#L56

they have basically given this function to kind of check the info regarding those feeds again if one wants to...

Ananya2001-an commented 1 year ago

And also for the fuzzy text matching as u mentioned above do we need something like this:

Maybe typer has something like that...

220603645-ad8208ed-8816-4142-a20f-d4e9e09f2871

nilsnolde commented 1 year ago

I know typer has support for argument autocomplete. The kind you’re showing would also be nice to have (though it doesn’t seem fuzzy to me), but not sure if it’s easily supported. Maybe smth for later?

I meant for now, rather do like „—text-search ber“, then enter, and it prints all the datasets which have „ber“ in its name.

Ananya2001-an commented 1 year ago

aah okay so basically filtering the feed urls with the search input...

nilsnolde commented 1 year ago

Right „filter“ is a better word!

gis-ops / gtfs-fetcher

feat: fetch-feeds sub command and other changes #3

list-feeds sub command to have 3 options (--bbox, --predicate and --pretty)

fetch-feeds sub command to have one option i.e --sources