Learn alongside me as I navigate the challenges of applying data science concepts to real-world data. This project highlights the importance of data preparation, modeling strategies, and the impact of data quality on analysis outcomes.
0
stars
0
forks
source link
Time Series Analysis with Melbourne Housing Data #7
This pull request summarizes the work completed in the branch dedicated to Time Series Analysis using the Melbourne Housing Data, cleaned by the script 1_clean_melb_data.py. The goal was to understand the fundamentals of time series modelling and analyze market trends and seasonality in the Melbourne housing market.
Strategy and Learning Approach
Data Utilization: Started with the dataset from 1_clean_melb_data.py, focusing on price data in its original scale.
Outlier Management: Initially excluded outliers in land size for simplicity, with plans to reintroduce them later for comparative analysis.
Learning Focus: Emphasized building a strong foundation in time series concepts and methodologies.
Documentation and Evolution of Analysis
Progress Updates: Regularly updated on learning progress, challenges, and insights.
Adaptive Strategy: Adapted the strategy based on initial findings, with plans to explore complex aspects like outlier impacts.
Comparative Exploration: Aimed to compare basic time series models with and without outliers in land size.
Initial Progress
Moving Average Implementation: Began with a moving average model, leading to a restructuring of the dataset to focus on 'Date' and 'Price'.
Simplification of Dataset: Created a new DataFrame date_avgprice for clearer analysis.
Strategy Revision: Plans to implement exponential smoothing and explore advanced models like ARIMA.
Further Analysis
Moving Average Insights: Conducted a detailed analysis of trends and volumes, correlating sales volume with average prices.
Market Dynamics: Explored legislative impacts and market mechanisms, including the influence of foreign investment and regulatory changes.
Seasonality Speculation: Investigated potential seasonal effects on market dynamics.
Transition to Exponential Smoothing
Data Frame Refinement: Focused solely on price data, addressing date inconsistencies using Pandas' resample function.
Challenges in Resampling: Overcame hurdles in date frequencies and missing data through bi-weekly resampling and forward filling.
Model Preparation: Emphasized the importance of data order in splitting for training and testing.
Optimizing Exponential Smoothing Parameters
Grid Search Implementation: Conducted a grid search to find optimal parameters, using TimeSeriesSplit for cross-validation.
Model Selection Challenges: Encountered issues in model convergence, leading to a consideration of simple exponential smoothing or ARIMA models.
Conclusion of Time Series Analysis
Incompatibility with Traditional Models: Concluded that traditional time series models are not viable for this dataset due to date inconsistencies.
Insights from Analysis: Acknowledged the presence of trends and seasonal patterns from Moving Average analysis but found limitations in Exponential Smoothing.
Future Direction: The immediate focus is to create a concise dashboard summarizing the findings from this project. It will highlight key insights from the time series analysis and the challenges faced with traditional models, emphasizing the need for machine-learning approaches for this dataset
Reflection and Next Steps
Finalizing Current Phase: The next step is to compile the learnings into a dashboard, marking the conclusion of this project's traditional data analysis phase.
Introduction
This pull request summarizes the work completed in the branch dedicated to Time Series Analysis using the Melbourne Housing Data, cleaned by the script
1_clean_melb_data.py
. The goal was to understand the fundamentals of time series modelling and analyze market trends and seasonality in the Melbourne housing market.Strategy and Learning Approach
1_clean_melb_data.py
, focusing on price data in its original scale.Documentation and Evolution of Analysis
Initial Progress
date_avgprice
for clearer analysis.Further Analysis
Transition to Exponential Smoothing
Optimizing Exponential Smoothing Parameters
Conclusion of Time Series Analysis
Reflection and Next Steps