DSCI-310-2024 / data-analysis-review-2024

2 stars 0 forks source link

Submission: Group 8: Predict and Classify Online Shopper Intention #8

Open ttimbers opened 6 months ago

ttimbers commented 6 months ago

Submitting authors: Calvin Choi, Nour Abdelfattah, Sai Pusuluri, Sana Shams

Repository: https://github.com/DSCI-310-2024/DSCI-310_Group-Project_Group8_Purchasing-Intent-Analysis/releases/tag/v2.0.0

Abstract/executive summary:

Given the surge of online shopping, online retailers may get a lot of site traffic but what ultimately matters is whether or not users finalize their purchase. Marketing and User Experience teams are tasked with optimizing a site’s interface and content in order to improve customer retention and the site’s revenue. Given this, understanding customer browsing behaviour and web page features is crucial for not only improving the user’s experience, but also maximizing the retailer’s revenue.

This project aims to analyze various features of online shopper’s sessions on a site to predict whether the customer makes a purchase. We will use the dataset, Online Shoppers Purchasing Intention dataset from the UCI Machine Learning Repository.

Editor: @ttimbers

Reviewer: Yunxuan Zhang, Dia Zavery, Hanlin Zhao, Olivia Lam

diaazavery commented 5 months ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

2

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

You've put together an impressive analysis, with clear explanations of the models under consideration and a thorough discussion of the trade-offs involved. Your attention to detail in these areas is commendable. However, I noticed in the pdf document that there are placeholders for table and figure references—such as "Table ??" and "Figure ??." To maintain the quality and clarity of your report, it would be beneficial to ensure these references are correctly linked to the actual tables and figures they correspond to. The data section of your project has a lot of datasets which is really thorough. It might help to sort them into 'raw' and 'processed' folders for easier navigation. Also, it's not immediately clear why the raw data is split into features and targets. Adding a quick explanation for this could clear up any confusion. In your Readme file, there's a small discrepancy in the instructions for setting the kernel environment. It mentions ProjectMilestone1_env, but the correct environment name is project_env. Updating this detail will help avoid any confusion for those replicating the analysis. Overall, I enjoyed reading through the analysis, it made everything super clear. You guys did a great job on the project!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

x99i9 commented 5 months ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

1.5

Review Comments:

Guys from Group 8, I think you've done a great job. Here are just some of my tips and comments.

  1. Addressing Class Imbalance I noticed in your analysis that the class imbalance in the 'Revenue' feature was acknowledged, which is great! It's really important to understand how such imbalances can skew our model's performance, especially towards the majority class. I'm curious if you considered implementing any specific strategies to tackle this imbalance, such as SMOTE (Synthetic Minority Over-sampling Technique) or undersampling techniques? It might be beneficial to explore these options to enhance the model's ability to predict minority class instances more accurately.

  2. Expanding Model Evaluation Metrics Your choice to use Precision as an evaluation metric is insightful, particularly for focusing on the model's performance in predicting positive instances. Given the class imbalance issue, incorporating additional metrics like Recall, F1 Score, and the AUC-ROC curve could provide a more rounded view of the model's effectiveness across different scenarios. These metrics can help highlight if the model disproportionately favors one class over another and could be a great addition to your analysis.

  3. Detailed Preprocessing and Feature Engineering Steps I found the sections on data preprocessing and feature engineering really interesting! It's clear that a lot of thought went into preparing the data for modeling. However, I think it would be even more helpful if you could dive deeper into some of the choices made during this process. For instance, explaining why certain scaling methods were chosen or detailing the rationale behind dropping or transforming specific features could give readers a clearer understanding of your project's methodology. It could also offer insights into how different preprocessing steps impact the overall model performance.

  4. For test file One small thing I noticed, though, is that it seems like the tests are missing comments or docstrings explaining what each one does. I totally get how tests might seem self-explanatory, especially when you're deep into writing them. But I found that adding a short comment or docstring can be super helpful, not just for others trying to understand what's being tested (like me 😅), but also for our future selves when we come back to this code after a while. It's sort of like leaving breadcrumbs for anyone who follows, making it easier to grasp the purpose and logic behind each test. Just a thought! Keep up the awesome work!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Choudoufuhezi commented 5 months ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

Review Comments:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

lam-oli commented 5 months ago

Data analysis review checklist

Reviewer: ole-lam

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hour 15 mins

Review Comments:

  1. This was a really great analysis with a really applicable problem to explore! You guys were very through explaining each step of your analysis. I thought it was really smart to use a confusion matrix to support your conclusion as you had started your analysis under the knowledge that your data had a class imbalance.

  2. I noticed in your pdf there are a few format errors, but very easy to fix! I noticed that your in-text citations didn't have the right syntax, your figures were referenced as ?? or [INSERT LATER], and your pie charts themselves could have been made more readable by perhaps using an alternative chart like a stacked bar chart or waffle chart.

  3. I also noticed that in your preprocessing you immediately remove all NA values and duplicate values. Although I agree both of these could contribute to more class imbalance especially duplicates, I think it is important to explore these duplicate and NA rows before immediately dropping them in case you lose a substantial amount of data.

  4. I also noticed that your pdf left out a lot of information from your initial analysis like recall and F1 score as well as feature engineering. Perhaps it was a time constraint thing (I totally understand), but I thought these aspects of your original analysis could have been further elaborated on and enhance your final submission. Especially if you explained perhaps why you thought page_values had such an impact on your analysis, as it felt like you added feature engineering just to do it, and not because you were necessarily making an argument for why it was necessary to incorporate it.

Overall such a cool project, I was really looking for little things to improve!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.