Positive notes:
The project seems to be making good progress. Although the model is far from perfect, you have already started considering iterating over improvements to the base model. It is good that you realized linear regression would not be the best tool for the job considering the model output is ordinal between 1 and 5. Hopefully this progression results in something with better accuracy. It was also great that you all used cross validation to choose your parameter for logistic regression and in general used good validation techniques. In terms of the actual report, the graphs were a nice addition.
Improvements or Considerations:
You say you implemented multinomial logistic regression but I believe what you are looking for might be Ordinal Logistic Regression (https://en.wikipedia.org/wiki/Ordered_logit) since there is a relative relationship between the 5 categories you are predicting (i.e. 5 is better than 4 which in turn is better than 3).
I was also confused with how you implemented the proportions of categories for each restaurant. Is it your intention to calculate this "preference value" as a proportion that is unique to the user and the restaurant that the rating will be predicted for? I am interested in seeing the result of this. You may want to watch out for this boundary condition: Consider a user that prefers Italian restaurants (and thus visits many) but will only try a different type if a friend suggests a highly rated one. This user will most likely rate the restaurant highly as well. If this is true for enough users, your model may learn that a lower "preference value" corresponds to a higher rating because the user will only visit a restaurant outside of their preferences if it is one of the best of its type.
In terms of the actual report, maybe take a little more time to proofread. I was able to find multiple errors such as "after clearning the data", "this happened for some features do not happens frequently", and "the average score the customers gives to a restaurant".
Positive notes: The project seems to be making good progress. Although the model is far from perfect, you have already started considering iterating over improvements to the base model. It is good that you realized linear regression would not be the best tool for the job considering the model output is ordinal between 1 and 5. Hopefully this progression results in something with better accuracy. It was also great that you all used cross validation to choose your parameter for logistic regression and in general used good validation techniques. In terms of the actual report, the graphs were a nice addition.
Improvements or Considerations: You say you implemented multinomial logistic regression but I believe what you are looking for might be Ordinal Logistic Regression (https://en.wikipedia.org/wiki/Ordered_logit) since there is a relative relationship between the 5 categories you are predicting (i.e. 5 is better than 4 which in turn is better than 3).
I was also confused with how you implemented the proportions of categories for each restaurant. Is it your intention to calculate this "preference value" as a proportion that is unique to the user and the restaurant that the rating will be predicted for? I am interested in seeing the result of this. You may want to watch out for this boundary condition: Consider a user that prefers Italian restaurants (and thus visits many) but will only try a different type if a friend suggests a highly rated one. This user will most likely rate the restaurant highly as well. If this is true for enough users, your model may learn that a lower "preference value" corresponds to a higher rating because the user will only visit a restaurant outside of their preferences if it is one of the best of its type.
In terms of the actual report, maybe take a little more time to proofread. I was able to find multiple errors such as "after clearning the data", "this happened for some features do not happens frequently", and "the average score the customers gives to a restaurant".