Open EricDiao opened 5 years ago
The asynchronous dataset crawler is implemented in /atc.py
and /data_sources/flightradar24Crawler.py
.
This part is consist of two parts. The first one is called data_provider
which is implemented in /data_sources/flightradar24Crawler.py
( the function crawlFR24MultiprocessingWrapper
). It does only one thing: fetch data from flightRadar24 and put this data into a multiprocessing.Queues
object which is also accessible to data_consumer
that will be discussed below.
The second part is data_consumer
. data_consumer
shall be implemented by the learning algorithm part. It takes in at least one parameter data_queue
. An example implemented in data_consumer
in atc.py
.
The main invoker of our program is atc.py
. It is also the part where data_provider
and data_consumer
is called.
@Hang14 See if this implementation is feasible.
The schema of data the data_provider
provides is described below:
{
timestamp: flight_data,
}
This is basically a dict
object of python, where timestamp
is UNIX timestamp in float format and flight_data
is a list of all flights we get at this time point in the format described in crawlFR24
in /data_sources/flightradar24Crawler.py
.
The learning algorithm parts needs dataset within a fixed time period. So a "cache" is needed.
Plan for solving that is:
timestamp
label;multiprocessing
to create two process. One for the actual training, one for fetch data from FR24;multiprocessing.Queues
for inter-process data transfer.Add this issue to #4.