gloriadazevedo / ORIE4741_Project

3 stars 4 forks source link

Peer Review from Adam Wang (yw287) #5

Open adamwhat opened 8 years ago

adamwhat commented 8 years ago

I found the speed dating problem really intriguing. After examining the data, I am convinced that the data is sufficiently messy (with different scales, type, and missing data).

3 Things I like:

  1. You made the hypothesis that certain factors will automatically influence the compatibility. It would be interesting to see which factors are truly important for a speed date.
  2. You proposed how to pre-process the data such as rounding to the nearest 5. It's important to pre-process the data to normalize results and minimize noise.
  3. You brought up the fact that the data might now be randomly selected. I think it's a valid point that this data was not collected from a 100% representative sample.

3 Things I noticed:

  1. I was wondering if the metrics of the data is comprehensive enough to have a valid prediction. There are objective factors, but people also are irrational towards a relationship.
  2. How would you pre-process and normalize the categorical data? It seems that some data is not very consistent (i.e. NYC vs New York).
  3. How can you evaluate the design choice for pre-processing (i.e. Will rounding to the nearest 5 better than rounding to the nearest 10)?

Overall, I really liked this problem. I'm very excited to see the results!

gloriadazevedo commented 8 years ago

Thank you for your feedback! We are definitely facing some data consistency issues and we're working to edit them or address the data inconsistencies on a new copy and also documenting our changes and justification.