Peer Review from Adam Wang (yw287)

I found the speed dating problem really intriguing. After examining the data, I am convinced that the data is sufficiently messy (with different scales, type, and missing data).

3 Things I like:

You made the hypothesis that certain factors will automatically influence the compatibility. It would be interesting to see which factors are truly important for a speed date.
You proposed how to pre-process the data such as rounding to the nearest 5. It's important to pre-process the data to normalize results and minimize noise.
You brought up the fact that the data might now be randomly selected. I think it's a valid point that this data was not collected from a 100% representative sample.

3 Things I noticed:

I was wondering if the metrics of the data is comprehensive enough to have a valid prediction. There are objective factors, but people also are irrational towards a relationship.
How would you pre-process and normalize the categorical data? It seems that some data is not very consistent (i.e. NYC vs New York).
How can you evaluate the design choice for pre-processing (i.e. Will rounding to the nearest 5 better than rounding to the nearest 10)?

Overall, I really liked this problem. I'm very excited to see the results!

gloriadazevedo / ORIE4741_Project

Peer Review from Adam Wang (yw287) #5