McGill-MMA-EnterpriseAnalytics / hotel-cancellation-analysis

End to end project using Predictive and Causal analysis for Hotel Booking Cancellation Insights
0 stars 1 forks source link

Hotel Booking Cancellation Analysis

End-to-end project using Predictive and Causal analysis for Hotel Booking Cancellation

Group Members:

Problem Statement

In the fast-paced hospitality industry, effectively managing hotel cancellations is crucial for maintaining revenue and enhancing guest experiences. Industry leaders emphasize a multifaceted strategy that includes optimizing revenue through strategic overbooking, building customer loyalty with effective communication, and implementing flexible cancellation policies. Integrating these practices with machine learning tools allows hotels to reduce the financial impact of cancellations and gain a competitive edge by meeting customer needs and ensuring high satisfaction levels.

Hypothesis

In our project, we aim to tackle the issue of handling booking cancellations in the hotel industry by employing a robust modeling framework. This includes a classification model to predict the likelihood of cancellations and a causal inference model to understand the impact of deposit policies on cancellation behavior. Through this approach, we aim to refine reservation policies and adjust deposit requirements to decrease cancellations and boost revenue.

Overview of Dataset

Our dataset, sourced from Kaggle (https://www.kaggle.com/datasets/arezaei81/hotel-bookingcvs), comprises 36 columns with a mix of 20 numeric and 16 categorical variables. These columns fall into five main categories: customer details, hotel information, amenities offered, booking specifics, and the target variable of cancellation status.

Data Preparation

Data Cleaning:

Feature Engineering:

Outlier Detection:

Feature Selection:

Class Imbalance Solutions:

Modeling Process

We followed a rigorous multi-step modeling process to ensure the robustness of our predictive analysis:

  1. Benchmarking a Variety of Models: A comprehensive suite of models was benchmarked to establish baseline performances. The models tested included:

    • DummyClassifier (as a baseline)
    • LogisticRegression
    • KNeighborsClassifier (KNN)
    • RandomForestClassifier
    • CatBoostClassifier
    • AdaBoostClassifier
    • XGBoostClassifier
    • LightGBMClassifier
    • A Stacked Model incorporating the above models
  2. Data Splitting: The dataset was split into training and testing sets with a 70-30 train-test split, ensuring adequate data for learning while also maintaining a substantial evaluation set.

  3. Preprocessing: The data was preprocessed to standardize features and address class imbalances. Techniques like Random Oversampling and SMOTE were employed to achieve a balanced dataset.

  4. Hyperparameter Tuning: Extensive hyperparameter tuning was performed using cross-validation and grid search methods, with a focus on improving the ROC-AUC score.

  5. Model Evaluation: The performance of the models was evaluated using a range of metrics, including accuracy, precision, recall, F1 score, and the ROC-AUC. ROC and Precision-Recall (PR) curves were plotted to visualize model performance.

  6. Model Selection: After evaluating all models, the one with the best performance metrics was selected for further analysis and deployment.

Key Takeaways

Results and Interpretation

Our chosen model, Random Forest model, has the highest accuracy of 0.880 and a ROC-AUC score of 0.865 post fine-tuning, suggesting that this model has strong predictive accuracy and the ability to effectively differentiate between the target classes.

Random Forest Feature Importance

SHAP Analysis

Both analyses underscore the importance of considering a range of factors when predicting booking cancellations and tailoring hotel policies accordingly.

Causal Inference

Here, we propose a dynamic deposit policy, categorized by customer sensitivity to deposits, aiming to maximize successful reservations while minimizing cancellations.

Causal Inference Modeling

Results of Causal Inference

The causal inference analysis provided actionable insights with actual values indicating how different subgroups are affected by deposit policies:

These results suggest that a dynamic deposit policy could be beneficial, taking into account the different sensitivities of various customer segments to deposit requirements.

Findings

Financial Implications from Causal Inference

In hospitality management, establishing a robust framework for strategic decision-making is crucial to navigate the complexities of guest booking dynamics effectively. The step-by-step framework, introduced in the Financial Implications report, offers a structured approach to assess the financial impact of deposit requirements on customer behavior.

Results

Conclusion and Next Steps

In conclusion, the judicious application of predictive modeling and causal inference techniques, particularly CATE, provides a robust framework for decision-makers in the hospitality industry. Our successful illustration of this approach underscores the profound potential for such data-driven strategies to enhance profitability and operational efficiency. Future endeavors may build upon this foundation, incorporating more granular data to refine and personalize deposit strategies, thus further fortifying the financial resilience of hotels against the perennial challenge of booking cancellations.

Next Steps


How to Run this Project

Contributing

We encourage contributions to this project. If you have suggestions for improving the models or the analysis, please fork this repository, make your changes, and submit a pull request. For significant changes, please open an issue first to discuss what you would like to change.