PANDASANG1231 / 522_Ramen

Project work for 552
MIT License
0 stars 3 forks source link

Milestone 1 Review #36

Open mohamad-amin opened 2 years ago

mohamad-amin commented 2 years ago

Good job! Here are my feedbacks for milestone 1 assessment.

  1. Project proposal: reasoning You might need to pay more attention to these parts:

    • "Clearly state the research question and any natural sub-questions you need to address, and their type." In your proposal, have you analyzed different possible situations that might arise when working with textual data? Why do you use logistic regression if you are facing a classification problem? If you are not doing regression and are doing classification, why do you have AUC score? Moreover, these details are not very much acceptable by a not-so-technical person (like AUC score).
    • What about data visualization? What specifically are you going to do?
    • For these algorithms, what packages will you use? Have you thought of using wrapper algorithms (boruta algorithm) for feature selection?
  2. Exploratory data analysis in a literate code document: VIZ Have you looked into the HTML report file that you have provided? It's not really opening on github. First, your report should be openable on github so that everyone could see this. Second, you don't need to convert it to HTML. That's why it breaks. Please do not convert your notebooks to HTML files again.

  3. Exploratory data analysis in a literate code document: QUALITY

    • It's nice that you have used the pandas profiling tool, but where is your motivation for the things that you have done? How do you wanna handle the missing values? What did you infer from your analysis? Just plotting the results without any results seems a bit pointless.
PANDASANG1231 commented 2 years ago

@mohamad-amin Hey, thank you for the feedback. It is really helpful.

I think your idea is very clear. Just one question, I am not sure if I understand this word clearly. "Why do you use logistic regression if you are facing a classification problem?" Although logistic regression has a name ending with the word 'regression', it is actually not a regression. Because it has a Softmax in the final layer and turns the algorithm into a binary classification algorithm. So do you mean we should try other classification algorithms besides LR, or you don't think LR is a good algorithm for classification? Thanks

mohamad-amin commented 2 years ago

Hey, sorry isn't your problem inherently a regression problem? (Ramens' rating) I assumed it would be a numerical rating, am I wrong?

PANDASANG1231 commented 2 years ago

Yeah, finally we changed it into a binary classification. Maybe we can state it more clearly in the summary

datallurgy commented 2 years ago

Hi @mohamad-amin!

Re: Comment 5: Pandas-profiling does not render in the ipynb file and only exports in HTML and JSON. Pandas-profiling to_file documentation. I understand it's not ideal, as the HTML does not render in github because it's interactive, but the file is easily downloadable and you can open it in browser. It doesn't print nicely to PDF either because we considered uploading the PDF of the EDA as well.

What would be your recommendation for rendering pandas-profiling reports?