Big Data Analytics - Course Overview and Resources
Welcome to the GitHub repository for the Big Data Analytics course at King Mongkut's University of Technology North Bangkok (KMUTNB). This repository contains materials, code samples, and resources associated with the course, providing a comprehensive overview of Big Data concepts and tools.
Course Structure
Week 1: Introduction to Big Data
- Introduction to Big Data: Understand the key concepts and significance of Big Data in today's technology landscape.
- Big Data Analytics Life Cycle: Learn about the phases of the Big Data analytics process, from data collection to actionable insights.
- Problems Discovery: Identify common challenges and issues faced in Big Data analytics.
- Apache Hadoop: Introduction to Hadoop, its architecture, and its role in Big Data processing.
- MapReduce Implementation (Part 1): Begin learning about the MapReduce programming model and its implementation basics.
Week 2: Advanced MapReduce and Apache Spark
- MapReduce Implementation (Part 2): Continue with advanced topics in MapReduce, focusing on optimization and real-world use cases.
- Apache Spark: Introduction to Apache Spark, including its architecture and advantages over Hadoop MapReduce.
- Apache Spark Implementation: Hands-on implementation of Spark, covering its core components and use cases.
Week 3: NoSQL Databases
- NoSQL Big Data Storage: Learn about different types of NoSQL databases and their suitability for various Big Data storage needs.
- NoSQL for Big Data Processing: Explore how NoSQL databases are used for processing large-scale data efficiently.
Week 4: Apache Spark Data Processing
- Batch Processing: Study the concepts and techniques for processing large datasets in batch mode using Apache Spark.
- Low-Level Apache Spark Data Processing: Dive into low-level operations and transformations in Spark, including RDDs and basic transformations.
- High-Level Apache Spark Data Processing: Explore high-level APIs like DataFrames and Datasets for more abstract and efficient data processing.
- High-Level Apache Spark Implementation: Practical implementation of high-level Spark features for complex data processing tasks.
Week 5: Data Aggregations and Joins
- Data Aggregations and Joins for High-Level Apache Spark: Learn techniques for data aggregation and performing joins in Spark using high-level APIs.
- Data Aggregations and Joins for High-Level Apache Spark Implementation: Implement aggregation and join operations with practical examples in Spark.
Week 6: Introduction to Streaming
- Introduction to Apache Spark Streaming: Understand the fundamentals of Spark Streaming and its role in real-time data processing.
- Apache Spark Structured Streaming: Explore Structured Streaming for managing streaming data in a more structured manner.
- Apache Spark Structured Streaming Implementation: Implement Structured Streaming solutions for real-time data processing tasks.
Week 7: Streaming Sources and Sinks
- Streaming Source and Sink: Study various sources and sinks used in streaming data pipelines, such as Kafka and file systems.
- Streaming Source and Sink Implementation: Practical implementation of data sources and sinks in Spark Streaming applications.
Week 8: Advanced Streaming Operations
- Streaming Event-Time Window Operation and Watermarking: Learn about event-time window operations and watermarking techniques for managing out-of-order data.
- Streaming Event-Time Window Operation and Watermarking Implementation: Implement event-time windows and watermarking in a streaming context.
Week 9: Midterm Examination
- Review and Consolidation: Focus on reviewing and consolidating knowledge from the first half of the course. Prepare for the midterm examination to assess understanding and progress.
Week 10: Visualization and Clustering
- Overview of Big Data Analytics Visualization: Learn about visualization techniques and tools for presenting Big Data insights.
- Big Data Analytics Visualization Design: Explore design principles and best practices for effective data visualization.
- Big Data Clustering: Understand clustering algorithms and techniques used in Big Data to group similar data points.
- Big Data Clustering Implementation and Visualization: Implement clustering algorithms and visualize the results to gain insights from the clustered data.
Week 11: Classification and Regression
- Big Data Classification: Study classification algorithms and their applications in Big Data scenarios.
- Big Data Classification Implementation and Visualization: Implement classification algorithms and visualize the outcomes.
- Big Data Regression: Learn about regression techniques and their use in predicting continuous outcomes from Big Data.
- Big Data Regression Implementation and Visualization: Implement regression models and visualize predictions to analyze their performance.
Week 12: Time Series and Recommendation Systems
- Big Data Time Series: Explore time series analysis techniques for handling and analyzing temporal data in Big Data.
- Big Data Time Series Implementation and Visualization: Implement time series models and visualize trends and patterns over time.
- Big Data Recommendation System: Learn about recommendation systems and their application in providing personalized suggestions based on Big Data.
- Big Data Recommendation System Implementation and Visualization: Build and visualize a recommendation system to understand user preferences and behaviors.
Week 13: Association Rules
- Big Data Association Rules: Study association rule mining techniques to discover relationships and patterns within large datasets.
- Big Data Association Rules Implementation and Visualization: Implement association rule mining algorithms and visualize the discovered rules to interpret their significance.
Week 14: Text Analysis
- Big Data Text Analysis: Learn techniques for analyzing and processing large volumes of text data to extract meaningful information.
- Big Data Text Analysis Implementation and Visualization: Implement text analysis methods and visualize results to gain insights from textual data.
Week 15: Graph Analysis and Deep Learning
- Big Data Graph Analysis: Explore graph analysis techniques for understanding relationships and connections within large datasets.
- Big Data Graph Analysis Implementation and Visualization: Implement graph algorithms and visualize graph data to analyze network structures.
- Big Data Deep Learning: Study deep learning techniques and their application in analyzing complex Big Data.
- Big Data Deep Learning Implementation and Visualization: Build and visualize deep learning models to solve advanced Big Data problems.
Week 16: Key Concepts Review
- Analyze and Summarize Key Concepts of Big Data Analytics: Review and consolidate key concepts and techniques covered throughout the course to prepare for the final examination.
Week 17: Final Examination
- Preparation and Assessment: Prepare for the final examination by revisiting all major topics and reviewing key concepts to demonstrate your understanding and skills.
Libraries and Tools
The course will utilize the following libraries and tools:
- Apache Hadoop
- Apache Spark (including Spark Streaming and Structured Streaming)
- NoSQL databases (MongoDB, Cassandra, HBase)
- Matplotlib, Seaborn (for visualization)
- Scikit-learn, Statsmodels (for machine learning and statistical analysis)
- TensorFlow, PyTorch (for deep learning)
- NLTK, SpaCy (for text analysis)
- Apache Spark GraphX (for graph analysis)
- Apache Kafka (for streaming data)
Resources
- Lecture Slides: Provided in individual lecture folders.
- Code Samples: Available in the
code
directory.
- Additional Reading: Refer to the
docs
directory for supplementary materials.
Feel free to explore the repository and utilize the provided materials to enhance your understanding of Big Data Analytics. If you have any questions or issues, please open an issue in this repository or contact the course administrator.
License
This project is licensed under the MIT License. All rights to the project are reserved for Dr. Sirintra Vaiwsri at King Mongkut's University of Technology North Bangkok (KMUTNB). See the LICENSE file for details.