Closed karan-khubdikar closed 4 months ago
This was derived from the JOSE review checklist and the ROpenSci review checklist.
I could not get the docker compose to work. I waited for ~40 minutes. I am not sure if this issue could be related to my machine but if not, then maybe adding a note about the estimated time would be nice. I ended up creating an env on my machine using the environment.yaml file and ran your notebook (src/analysis_titanic_survival.ipynb
) on Jupyter lab. The notebook ran beautifully! Additional testing of running the analysis with docker might be needed. Machine details: MacOS - Processor: 2.6 GHz 6-Core Intel Core i7.
More details about the project should be included in the About
/Summary
sections. For example, the About
section in Tiff's repo https://github.com/ttimbers/breast_cancer_predictor_py has the research objective, conclusions and methods briefly mentioned which gives me as a reader a quick overview of project details i.e. what the project is about, what they found and how they found it. In your case, including these points in the section would be very helpful.
Overall, you all did a great project! excellent work!! Reviewing your project was a great learning experience for me and will benefit our group very much!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
You guys did really good job! The topic you have chosen are not only fascinating, but also have important implications for preventing similar disasters in the real world. Overall, I am highly impressed with your work.
The report effectively explains the use of Logistic Regression, but it would be beneficial to include a more detailed rationale for choosing this specific method over others. For example, why was Logistic Regression chosen instead of other models like SVC or Random Forest. In my opinion, for this specific task, Random Forest may have better performance but worse interpretability.
The report mentioned that the model was never tested on any test data. This prevents us from evaluating the performance of this model. For example, assuming the accuracy of this model is only 50%, then the interpretation of the coefficients obtained does not make much sense. I think it would be better to choose a suitable metric (e.g., accuracy, f1 score), test and report the performance of the model on a reserved test set. Given that the website you obtained your data from has a separate test set, and you have not used it in your analysis. Therefore, testing your model using this test set won't violate the ML golden rule.
During data pre-processing, you guys chose to drop the variable Ticket ID
. But I took a closer look at the raw data, and found that the ticket
column is not just a random ticket number. There are some special patterns, such as S.C./A.4. 23567
. I'm not sure what the alphanumeric combination at the beginning means. Maybe it represents some hidden information. It would be better to delve more deeply into this column rather than just ignore it.
In the Results & Discussion of Logistic Regressio
section of your final report, the table number for the logistic regression results table is missing. Maybe it can be added in the next milestone.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @karan-khubdikar @Sampsonyu @alanpow @fohy24
Repository: https://github.com/UBC-MDS/What-Effects-One-Chance-of-Survival-on-the-Titanic-A-Logistic-Regression-Analysis Report link: https://ubc-mds.github.io/What-Effects-One-Chance-of-Survival-on-the-Titanic-A-Logistic-Regression-Analysis/analysis_titanic_survival.html Abstract/executive summary: This project analyzes the Titanic passenger data, we delve into the factors that influenced passenger survival on this historic voyage. Leveraging advanced data analytics, we explore various elements such as passenger class, age, gender, and embarkation point to unravel patterns and insights that shaped the likelihood of survival.
The analysis leverages the Titanic Passenger Survival Data Set, which is a compilation of passenger data from RMS Titanic. The analysis will be conducted using R and Python.
Editor: @ttimbers Reviewer: Prabhjit Thind(@Prabh95), Yan zeng(@Owl64901), Hina Bandukwala(@hbandukw), Wenyu Nie(@wenyunie)