Closed JackyXu-Cool closed 1 year ago
Thanks @JackyXu-Cool for doing this! I have added my comments below:
Question about dataset needed: KMeans: We need just one training dataset and one Y-label dataset. (Y-label is optional) This is correct. If the user wants to calculate the accuracy (and perhaps some other metrics), they will need to upload a y-label. Otherwise, clustering can be done with just the X dataset.
Hierarchical clustering: Similar to kmeans
Naive Bayes: training dataset and y-label (required) Yes this is correct. Technically we need all four datasets (xtrain, xtest, ytrain, ytest), but I took a look at @lhyelinn 's code and realized that she actually has already performed data splitting in her code (namelyhttps://github.com/JackyXu-Cool/Team-2130-Machine-Learning-Roulette/blob/master/mlr_backend/naivebayes/naivebayes.py#L74). This allows user to just upload two datasets: X and y-label. I think this is a pretty neat feature
Decision Tree: one training dataset, one testing dataset, one training Y-label dataset, and one testing y-label dataset. As of right now, we need all four datasets. However, after taking a look at Hyelin's code (sparks of inspiration yay!) and I think we can actually do something similar on data splitting. There are two approaches that we can go with: **1. Keep everything as is. Ask the users to upload four datasets if they want to use dtree
@lhyelinn @hloneal @honeal3 Can you double-check that what I said about your algorithm is correct? Also I want to cc @mmmmartyzhao @Timiport to get their inputs on the idea of adding a new parameter for data splitting.
Feel free to ping me anytime regarding this :smiling_face_with_three_hearts:
I think adding a parameter for data splitting would be a better idea. This will allow a uniform front-end layout for every ML algorithms.
In this PR, I put "Select ML model" as our first step of the upload process and "upload dataset" as the second step. However, the number of dataset needed won't adjust accordingly based on the ML model selected for now. Will focus on that in the next PR.
Question about dataset needed:
@ruokun-niu Do I understand the logic right? I will start to work on this if this looks good to you.