Team-Earth-Quake-Detector / team_1_earth_quake_detector

Hello World! We are three coding-rookies from Germany. We want to create an earthquake detector that visualizes a world map with earthquakes all over the world in real time. Enjoy!
0 stars 1 forks source link

1. Goals

2. Description

In the following, we are going to describe our understanding of the business question or problem, analytical question and/or technical problem associated with the use case.

What are the challenges that need to be mastered?

3. Methodological approach - Teams Data Science Process

A methodological approach is chosen to process and implement the use case. For analytical use cases it is wise to follow a structured approach like Microsoft's Team Database Science Process (TDSP), which is the legitimate successor of the CRISP-DM methodology. Team Data Science Process (TDSP) is an agile and iterative data science methodology to improve collaboration and team learning. It is supported through a lifecycle definition, standard project structure, artifact templates, and tools for productive data science.

Key Components of the TDSP:

  1. Data Science lifecycle definition

    • Business Understanding
    • Data Acquisition and Understanding (Data Source, Pipeline, Exploration and Cleaning)
    • Modeling (Feature Engineering: Feature selection, Transforming and Binding)
    • Deployment (Performance, Monitoring)
  2. Standardized project structure

    • Template for folder structure
    • This is the general project directory structure for Team Data Science Process developed by Microsoft.
    • It also contains templates for various documents that are recommended as part of executing a data science project when using TDSP.

    img.png

  3. Infrastructure and resources recommended for the project

TDSP provides recommendations for managing shared analytics and storage infrastructure such as:

The storage infrastructure, where raw and processed datasets are stored, may be in the cloud or on-premise.This infrastructure enables reproducible analysis. It also avoids duplication, which may lead to inconsistencies and unnecessary infrastructure costs. Tools are in place to provide the shared resources, track them, and allow each team member to connect to those resources securely. It is also a good practice to have project members create a consistent compute environment. Different team members can then replicate and validate experiments.

  1. Tools and utilities recommended for project execution

Introducing processes is challenging in most organizations. Tools to implement the data science process and lifecycle help lower the barriers to and increase the consistency of their adoption. TDSP provides an initial set of tools and scripts to jump-start adoption of TDSP within a team. It also helps to automate common tasks in the data science lifecycle such as data exploration and baseline modeling. There is a well-defined structure provided for individuals to contribute shared tools and utilities into their team's shared code repository. These resources can then be leveraged by other projects within the team or the organization.


4. Details of the approach

As we mentioned before, we followed the Team Database Science Process (TDSP). Therefore, we created a project management board to ensure a structured approach to process and implement our Earthquake-Detector project:

static/images/ProjectBoard_EarthquakeMonitor.png static/images/Legend_ProjectBord.png

To ensure overall success of our project, we set up a kanban board in our GitHub repository and created a new issue for every task to be fulfilled in the working process. This allowed us to keep track of the to-dos, the work in progress and tasks we had already achieved. Additionally, we created a backlog category called "place for ideas". We discussed the suggestions and brainstorming ideas included in this category in our weekly meetings and decided as a team, if the idea becomes a to-do or not. The structure of our kanban board looked as follows:

static/images/kanban_board.png

5. Details of the work

5.1 Processing of real-time data

5.2 Geo data calculations

5.3 Geo data visualization with OpenStreetMaps

5.4 Web frontend development & Web-service backend development

5.5 Searchbar for different user defined configurations

6. Class Definition

6.1 DataCollector

Function Input parameters Description Return
init lat (default = 0), long (default = 0) Uses the user’s current location if no values for longitude and latitude are provided for further data preparation. -
load_data - Accesses USGS earthquake data and transfers response to earthquakes variable. -
prep_data - Loads data with load_data function. Extracts relevant earthquake features (id, longitude, latitude, place, time, magnitude) and transfer data to an individual dictionary per earthquake. After that appends earthquake dictionaries to earthquake_data list. earthquake_data
filter_radius location (default = None), user_provided_radius (default = 250) Prepares data with prep_data function. Uses the user’s current location if no specific location is provided. Calculates the distance between this location and every earthquake in earthquake_data list and adds the distance (in km) to the earthquake’s dictionary in earthquake_data list. If the calculated distance is smaller than the radius provided by the user, the earthquake is added to earthquake_data_clean list. earthquake_data_clean

6.2 Map

Function Input parameters Description Return
init lat (default = 0), long (default = 0) Uses the user’s current location if no values for longitude and latitude are provided for further data preparation. -
set_up_map location (default = None) Set up a folium map that is centered at the user provided location, highlighted by a marker. By default, the map is centered at the user’s current location. The zoom-in factor adjust dynamically depending on the user provided radius (default = 250 km). OpenStreetMap was chosen as the default layout. -
save_map file_name Saves the map with the provided file name. -

6.3 Overlay

Function Input parameters Description Return
init lat (default = 0), long (default = 0) Uses the user’s current location if no values for longitude and latitude are provided for further data preparation. -
Function Input parameters Description Return
init earthquake_data_clean Inherit init configurations from superclass Overlay and initialize earthquake_data_clean variable. -
apply_circle_markers map Adds a circle marker for every earthquake in earthquake_data_clean to the map. The circle markers are added to the map as a feature group that allows the user to show or hide the circle markers via layer control. By default, the circle markers are shown on the map. The circle marker’s size and color are dynamically adjusted according to the earthquake’s magnitude. By hovering over the circle marker, the user gets information about the earthquake’s time and magnitude. -
apply_magnitude_markers map Adds a magnitude marker for every earthquake in earthquake_data_clean to the map. The magnitude markers are added to the map as a feature group that allows the user to show or hide the magnitude markers via layer control. By default, the magnitude markers are shown on the map. -
apply_connective_lines map, location (default = None) Adds a connective line between every earthquake in earthquake_data_clean and the user provided location to the map. The connective line are added to the map as a feature group that allows the user to show or hide the connective line via layer control. By default, the connective lines are shown on the map. -
apply_heatmap map Adds a heatmap of the earthquake occurrence of the last 24 hours within the user provided radius of the user provided location. The heatmap can be shown or hidden via layer control. By default, the heatmap is not shown on the map. -
Function Input parameters Description Return
apply_overlay map Adds the tectonic plates to the map. The tectonic plates can be shown or hidden via layer control. By default, the tectonic plates are shown on the map. -
add_to_layer_control map Adds layer control functionality in the top right corner of the map. -

6.4 EarthquakeAnalytics

Function Input parameters Description Return
init earthquake_data, earthquake_data_clean Initialize earthquake_data and earthquake_data_clean variables. -
get_total_filtered_earthquakes location (default = None), radius (default = 250) Calculate the number of total earthquake occurrences of the last 24 hours within the user provided radius of the user provided location. total_filtered_earthquakes
get_filtered_minor_earthquakes location (default = None), radius (default = 250) Calculate the number of minor earthquake occurrences of the last 24 hours within the user provided radius of the user provided location. In our case, a minor earthquake is defined as an earthquake with a magnitude below or of 2.5. filtered_minor_earthquakes
get_filtered_moderate_earthquakes location (default = None), radius (default = 250) Calculate the number of moderate earthquake occurrences of the last 24 hours within the user provided radius of the user provided location. In our case, a moderate earthquake is defined as an earthquake with a magnitude between 2.5 and 6.0. filtered_moderate_earthquakes
get_filtered_strong_earthquakes location (default = None), radius (default = 250) Calculate the number of strong earthquake occurrences of the last 24 hours within the user provided radius of the user provided location. In our case, a strong earthquake is defined as an earthquake with a magnitude above 6.0. filtered_strong_earthquakes
get_closest_filtered_earthquake location (default = None) Calculate the distance (in km) between the user's current or provided location and the closest earthquake of the last 24 hours. closest_filtered_earthquake
get_place_of_closest_filtered_earthquake location (default = None) Get the place of the closest earthquake of the last 24 hours. The place information is included in the USGS API. place_of_closest_filtered_earthquake
get_strongest_filtered_earthquake location (default = None), radius (default = 250) Get the highest magnitude of all earthquakes of the last 24 hours within the user provided radius of the user provided location. strongest_filtered_earthquake
get_total_earthquakes_worldwide - Calculate the number of total earthquake occurrences worldwide of the last 24 hours. total_earthquakes_worldwide
get_minor_earthquakes_worldwide - Calculate the number of minor earthquake occurrences worldwide of the last 24 hours. In our case, a minor earthquake is defined as an earthquake with a magnitude below or of 2.5. minor_earthquakes_worldwide
get_moderate_earthquakes_worldwide - Calculate the number of moderate earthquake occurrences worldwide of the last 24 hours. In our case, a moderate earthquake is defined as an earthquake with a magnitude between 2.5 and 6.0. moderate_earthquakes_worldwide
get_strong_earthquakes_worldwide - Calculate the number of strong earthquake occurrences worldwide of the last 24 hours. In our case, a strong earthquake is defined as an earthquake with a magnitude above 6.0. strong_earthquakes_worldwide
get_strongest_earthquake_worldwide - Get the highest magnitude of all earthquakes of the last 24 hours worldwide. strongest_earthquake_worldwide
get_place_of_strongest_earthquake_worldwide - Get the place of the strongest earthquake of the last 24 hours worldwide. The place information is included in the USGS API. place_of_strongest_earthquake_worldwide

6.5 Location

6.6 LocationResolver

Function Input parameters Description Return
init address (default = empty string) Takes the user provided address as an argument. If no user address is provided, takes the user’s current location as an argument. -
get_current_location - Accesses the user’s current IP address and extracts longitude and latitude values. These longitude and latitude values are then transformed in address names. current_location

6.7 Monitor

Function Input parameters Description Return
collect_default_data - Sets up a DataCollector object and applies the filter_radius function. -
relocate location (default = None), coordinates (default = None), radius (default = 250) Applies the filter_radius function according to a user provided address. If a string with a location is provided, this location is transformed into longitude and latitude values before applying the filter_radius function. If coordinates are provided, the filter_radius function is applied with these coordinates. earthquake_data_clean
build_map location (default = None), coordinates (default = None), radius (default = None) Builds a map to be displayed on the web application. If neither a location nor coordinates are provided (default when webserver is started), the map is built around the user’s current location. If either a location or coordinates are provided (manually entered by the user is the search field), the map is built around this address. In both cases, a base map is built and features (circle markers, magnitude markers, connective lines, a heatmap, tectonic plates and layer control) are added to the map. map
perform_earthquake_analytics location (default = None), radius (default = None) Sets up a DataCollector object and performs all functions of class EarthquakeAnalytics. total_filtered, minor_filtered, moderate_filtered, strong_filtered, closest_filtered, place_of_closest_filtered, strongest_filtered, total_worldwide, minor_worldwide, moderate_worldwide, strong_worldwide, strongest_worldwide, place_of_strongest_worldwide

6.8 App

6.9 Class Architecture

The following graph shows how our classes are connected to one another.

static/images/class_architecture.png


7. Summary

A summary if the targets have been achieved, and if not - and whatever the reason is - why it wasn't achieved.

7.1 Achievements

Our primary goal was to build a realtime detection and visualization of earthquake occurrences within the last 24 hours in predefined region. Therefore, we developed an intuitive and user-friendly website tool for the visualization of earthquakes all over the world.

7.1.1 Required achievements

We managed to reach all the required achievements of our project:

7.1.2 Additional achievements

But we also managed additional achievements by implementing further helpful features:

To sum things up, all team members are proud and happy to finish the project. All our requirements and the project's success criteria are met. Furthermore, we had enough time and sufficient skills to implement even more features and make our project even greater. All in all, the project was a huge success regarding our development and coding knowledge as well as group work on IT projects. Through the work on GitHub, the Team Data Science Process, and the Kanban-Board, we gained valuable knowledge and skills for our future jobs. Nevertheless, there were also some difficulties and challenges during the project, but we always kept on track and found a solution.


8. Future Development

Although we fully meet the requirements and success criteria of the business case, there is always a way to improve and optimize our tool.

What are next steps that could be done in order to keep progress in the project?

8.1 Forecast

With further data analytics and the help of statistical and mathematical modeling like regression or classification we could develop an Earthquake-Forecasting-Tool. This tool could warn our users, whenever an earthquake is likely to occur in the defined area. This little tool could help to make our world a little safer for us humans.

8.2 Create mobile app

Earthquake information to go. It would be a nice tool to check all the recent earthquakes with your smartphone, without the need of a laptop or computer. Therefore, we could develop an Earthquake-App to see all earthquakes anywhere at any time.

8.3 Dashboard upgrade

The current dashboard shows six KPIs (see above). To make the dashboard even more sophisticated, it is possible to calculate all KPIs on a global scale (worldwide) and on a local scale (within user provided radius around user provided location). The respective functions are already implemented in our EarthquakeAnalytics class. This would give interesting additional insights for interested users. Concerning layout and frontend development, it would be advisable to search for a layout that includes a tab for the global KPIs and a tab for the local KPIs so that the user can individually choose the preferred option.