ECMWFCode4Earth / challenges_2019

Have a look at the challenges proposed for the 2019 edition of ECMWF's Summer of Weather Code.
44 stars 6 forks source link

Challenge #8 - Obtaining online aircraft metadata #8

Closed jwagemann closed 3 years ago

jwagemann commented 5 years ago

Challenge 8

Obtaining online aircraft metadata

Develop a software or API that collects aircraft data from different sources and merges them with position information from WMO AMDAR Observing System

Goal: Development of a software / API / tool that sets up a database with aircraft information from different sources, e.g. flightradar24 or flightaware, merged with position information from reports of WMO's AMDAR Observing System.


Mentors @BruceIngleby and @MohamedDahoui
Skills required - Python programming
- Extracting data using an API
- Data manipulation and merging
- Communication skills

Challenge description

What data/system(s) should be used?

We plan to use data from flightradar24, flightaware or other similar websites (free or with subscription) and merge this with position information from the AMDAR reports (either in ASCII or we can provide a simple program to extract the data) in order to build up a database containing the AMDAR identifier: aircraft type and airline. Any system should be capable of being run at ECMWF with minimum maintenance after ESoWC to provide updates for new aircraft etc.

What is the current problem/limitation?

Most aircraft data used at ECMWF comes in ‘AMDAR’ format. There is an aircraft specific identifier only used in AMDAR. AMDAR does not give the aircraft type, but we want to know this for monitoring and quality control purposes.

Other output (desirable):

A list of aircraft flight routes with departure and destination airports and the proportion of different aircraft types used on that route.

michiboo commented 5 years ago

Hi I am interested in this challenge, Could you provide the link to position information from reports of WMO's AMDAR Observing System?

BruceIngleby commented 5 years ago

Hello, thanks for your interest.
AMDW.egs.txt I attach a file giving examples of the information available (some aircraft report an encrypted version of their flight number and some report departure and destination airport). For this case there were 128322 reports in a six hour period (if necessary we could thin them).

michiboo commented 5 years ago

Hi I also would like to know what database system are you currently using or prefer to use? Knowing this I can create more detail in my proposal. Thanks!

BruceIngleby commented 5 years ago

Hi, we use mysql, but something simpler like sql-lite would do.
At the time we wrote the proposal we hadn't realised about the encrypted flight number (available for some flights). The mapping between that and flight number would also be a desirable (not essential) output. Thanks!

michiboo commented 5 years ago

Hi, for updating plane type for new AMDAR report, do you wish for live update for plane type? or you can update it on weekly basis. I did some search and found that updating it weekly for plane type is more efficient.

BruceIngleby commented 5 years ago

Hi, I think that updating the AMDAR plane type weekly is sensible. One possibility to be aware of is that one day you/we might be 90% sure that a particular AMDAR is a Boeing and the next day it appears as an A320 (depending partly on the reliability of the online information and partly on tolerances used for matchups) - or this may not happen at all.
We receive some AIREP reports with a flight identifier and for these it might be useful to have live information on which plane type is being used for that flight today - however this comes under possible future extensions.

michiboo commented 5 years ago

Hi, I have completed a draft proposal, it is possible for you to review it?

BruceIngleby commented 5 years ago

Hi, we have checked with one of the coordinators who replied: "We want to give every applicant the same chances. Thus, we do not recommend to evaluate the entire proposal beforehand. However, we encourage the participant to ask any questions he/she is unsure about via Github and you are able to respond. The applications are still open until 21 April."
Many thanks for your interest, we look forward to reviewing your proposal after 21 April. If you have specific questions before then please ask.

michiboo commented 5 years ago

Thank you for the information!

I have the following questions: How old and how much is the data that need to the information of plane type? Would you like to host the API on a server or just for internal use? Personally I think that is it better to use it as script as live update are not required

thanks!

BruceIngleby commented 5 years ago

We agree that a stand-alone script is what we need at the moment and would be easier to test.
Re age of data: I don't envisage going back in time, but starting this summer we could provide AMDAR position data in say 6-hour chunks (we will have to check/discuss how close to real-time we can provide the data). After two or three weeks I would hope that we have aircraft_type for almost all AMDARs (but there will be new ones come along from time to time). The maximum number of AMDAR reports in a six hour window is about 500K, but we can thin to some extent if necessary. Does this answer the questions? Thanks.

michiboo commented 5 years ago

Thanks for the answer! It is possible to to thin the data to only include the first and last record for each unique AMDAR ID record from your side?

jwagemann commented 5 years ago

REMINDER: Deadline to register and submit your proposal is upcoming Sunday, 21 April at 23:59 GMT!

Application process is a 2-step process:

Applications without a submitted proposal will not be taken under consideration! We are looking forward to your proposal!

BruceIngleby commented 5 years ago

Sorry for the delay in answering. For each unique AMDAR ID do you mean a) first and last report in a six hour period (seems a bit sparse to me) or b) first and last report of a particular flight? We should be able to do either, or perhaps one report per hour. We could do a bit of experimentation at the start to find a good balance between keeping data volumes down but providing some information on each flight.

michiboo commented 5 years ago

I mean b) first and last report of a particular flight, some experimentation would be useful as well. I notice the AMDAR sample you provided to me has some flight ID that are missing. Is it a common case in general for all AMDAR data?

BruceIngleby commented 5 years ago

Some data providers include a flight identifier and others don't (and the European E-AMDAR system includes a flight identifier that is almost the same as the aircraft identifier EU..., so that doesn't help). I am about to go on holiday for a week. I look forward to seeing your proposal.

michiboo commented 5 years ago

Thank you for being so helpful to my question!