fishcharlie / AirportStatusBot

This is a social media bot that will post delay information for airports in the United States.
MIT License
0 stars 0 forks source link

Consider reply posts/threads when a hub airport is involved #11

Open stucka opened 11 months ago

stucka commented 11 months ago

Wikipedia maintains a list of hub airports in the U.S. here: https://en.wikipedia.org/wiki/List_of_hub_airports#United_States

If it'd help and there's interest I could maybe build a scraper for that and linked subpages to periodically update a list of airline names and ICAO codes, e.g.:

Inbound aircraft to Los Angeles / Tom Bradley International Airport (LAX) are currently being delayed at their origin airport due to wind. Delays are currently averaging 1 hour and 14 minutes and are up to 2 hours and 21 minutes. AirportStatusBot

Perhaps followed by something like:

Delays at LAX may affect hub operations of as many as seven airlines: Allegiant Air (AAY) Alaska Airlines (ASA) American Airlines (AAL) Delta Air Lines (DAL) JetBlue Airways (JBU) Southwest Airlines (SWA) United Airlines (UAL)

Alternately:

LAX delays may affect hub operations for as many as seven airlines: AAY ASA AAL DAL JBU SWA UAL

That could have some value in searches, with someone looking for "lax delay american airlines" or "dal 1234 delay".

A lower lift might be to simply write words like "Large airport delay!" where the CSV's "type" field is "large_airport" ... ?

fishcharlie commented 11 months ago

Very interesting idea. Love the concept. It would help with searching for delays on these social networks.

I have two primary concerns:

  1. Using Wikipedia as a data source isn't the best practice. It's not bad, but it's not the best either. This would be much better suited for Wikidata in my opinion due to its structured data responses.
    • After some quick digging it looks like United Airlines has data for airline hubs. However, Denver International Airport doesn't have any information about what airlines have hub operations there. We need an inverse of the airline data there.
    • I'm pretty sure Wikidata supports having that inverse nature. But it might not be setup for that property. I'm not quite sure how we can adjust that on the Wikidata side to give us the inverse.
    • The other option might be to manually create a list of airlines and their Wikidata identifiers, then query Wikidata to create an inverse list. It would only work for the airlines in that list tho and wouldn't necessarily be a complete list. But I think that should be acceptable for this use case.
  2. I'm cautious about providing positive value here. Airline hub airports don't change extremely frequently (at least much less frequently than airport delays). So the nature of these posts will be inherently static in nature. Which means, for every LAX delay, we will have a bunch of identical messages about hub airport airlines. It's mixing very dynamic data from the FAA with very static data about airline hub operations.
    • This would provide great SEO and search value. But it would also create a lot of identical posts (which could be interpreted by some people as spammy).
    • One alternative would be to actually augment this with flight tracking data. Data like the FlightAware Cancellation Stats and FlightAware MiseryMap provide some interesting data in this area. Using that to determine actually which airlines & airports are most impacted would provide some really tangible data here.
      • The downside to this, is I really don't wanna use proprietary data for this project. And I'm unaware of a free/open data source for flight data (a project I've concerned building before, but just has a lot of expenses & complexity associated with it).
    • Another idea I just thought of as writing this would be to extend this project to have a web element. Where in the post it has a link to a page that has some of this supplemental information.
      • The downside to this of course is we might not get the SEO benefits within the social network itself.

So sorry for the long message. Mainly just thinking out loud here.

In short, I really love this idea. I'm just trying to think through all the options and such to ensure it's the best it can possibly be.

What are your thoughts @stucka?

stucka commented 11 months ago

Thoughts:

That's still 96 characters. Could be even shorter, like 7 hubs may be at risk: AAY ASA AAL DAL JBU SWA UAL.

fishcharlie commented 10 months ago

LAX delays may affect hub operations for as many as seven airlines: AAY ASA AAL DAL JBU SWA UAL

My question here is how many users actually understand those abbreviations. On some level it's not about the number of characters, it's about the value of the message. And I think for some users this would be harder to understand and extract value from.

I don't know what the data quality is on the Wikipedia page. Could certainly scrape it and turn it into something for Wikidata, no worries. I am not finding a more open data source than Wikipedia.

Actually a lot of Wikidata sources are derived from Wikipedia. Just makes it easier to work with since it has an API, so you don't have to build a scrapper.


I think one other consideration here that I've thought about is doing some research into seeing if Bluesky & Mastodon support creating a replied post that doesn't show up on users timelines, but does show up on the post detail & search pages.

AFAIK that doesn't exist today. But might be worth looking into a bit more.


I think at the very least, this data would be useful, regardless of how we decide to integrate it into the bot itself.