K-means clusters using raw python

SSahas commented 2 years ago

Hello, I am Sahas. I wanna create a python problem which is implementing k-means clusters algorithm using raw python. please tell me if this is considerable or too much for an exercise or it requires any changes.

K- means Algorithm : The K-means clustering algorithm computes centroids and repeats until the optimal centroid is found. It is presumptively known how many clusters there are. It is also known as the flat clustering algorithm. The number of clusters found from data by the method is denoted by the letter 'K' in K-means.

clusters gif .

The code is available here :

SSahas / exercism-problem

The code Mainly uses :

Dictionaries
Nested for loops

github-actions[bot] commented 2 years ago

🤖 🤖

Hi! 👋🏽 👋 Welcome to the Exercism Python Repo!

Thank you for opening an issue! 🐍 🌈 ✨

If you are requesting support, we will be along shortly to help. (generally within 72 hours, often more quickly).
Found a problem with tests, exercises or something else?? 🎉
◦ We'll take a look as soon as we can & identify what work is needed to fix it. (generally within 72 hours).

◦ If you'd also like to make a PR to fix the issue, please have a quick look at the Pull Requests doc.
We 💙 PRs that follow our Exercism & Track contributing guidelines!

Here because of an obvious (and small set of) spelling, grammar, or punctuation issues with one exercise,
concept, or Python document?? 🌟 Please feel free to submit a PR, linking to this issue. 🎉

‼️ Please Do Not ‼️

❗ Run checks on the whole repo & submit a bunch of PRs. This creates longer review cycles & exhausts reviewers energy & time. It may also conflict with ongoing changes from other contributors. ❗ Insert only blank lines, make a closing bracket drop to the next line, change a word to a synonym without obvious reason, or add trailing space that's not an[ EOL][EOL] for the very end of text files. ❗ Introduce arbitrary changes "just to change things" . _...These sorts of things are **not** considered helpful, and will likely be closed by reviewers._

For anything complicated or ambiguous, let's discuss things -- we will likely welcome a PR from you.
Here to suggest a feature or new exercise?? Hooray! Please keep in mind Chesterton's Fence.
Thoughtful suggestions will likely result faster & more enthusiastic responses from maintainers.

💛 💙 While you are here... If you decide to help out with other open issues, you have our gratitude 🙌 🙌🏽.
Anything tagged with [help wanted] and without [Claimed] is up for grabs.
Comment on the issue and we will reserve it for you. 🌈 ✨

BethanyG commented 2 years ago

Hi @SSahas 👋🏽

Thanks for filing this issue, and for stepping forward to (possibly) design an exercise for Exercism!

TL;DR: Specifications for practice exercises and specifications for concept exercises. Additionally, we use pytest as a runner for the track, so all tests would need to use unttest syntax, and be runnable via pytest. For additional considerations, see the Python Contributing Docs.

While having an algorithm implementation like this might be interesting, I do have some concerns:

This is an implementation of the algorithm with sample data, but to be meaningful to students we've found that solving a specific problem is more engaging and leads to better learning. K-means can be used for spam filtering, fraud detection, audience segmentation, signal processing, image segmentation, and recommendations - among other things. What problem would you center this on, and what would the data and problem statement for it look like?
This isn't "pure Python" or "raw Python" in a "traditional" sense -- your implementation uses Numpy, Pandas, Jupyter and Matplotlib. That isn't bad -- but it does mean the use of libraries beyond the Python standard lib. Since current exercism tooling for the website only supports core Python, we'd need to do some work to support the loading of external libraries such as numpy and pandas. And even with that work, we wouldn't support the use of Jupyter Notebooks or JupyterLab (they include a whole web stack and other complex considerations), and might not be able to support matplotlib in a very effective way, due to its visual nature.
In addition to website tooling, we have the issue of walking students through what they would need to set up to work on the problem via the CLI. There are certainly ways to do this, but it is additional work.
Running the steps of K-means repeatedly to reach optimum partitioning may not fit within the performance needs of our current platform. We'd need code and tests to execute in a maximum of ~10s before timing out. There are also the cases where a k-means implementation never reaches optimum, so we need to be careful of that in the construction of the data set.
K-means partitioning is not deterministic. Outcomes vary depending on the amount and position of the starting centroids and the number of iterations the algorithm goes through. That presents some challenges for student verification, testing, and feedback. I'd want to see what tests looked like for this problem, and run them over multiple solutions before we released anything on the platform/to students.
As it stands now, this isn't a programming or algorithm challenge as much as it is one of deciding how to apply or tune k-means. It also feels as though we'd have to point students at a lot of "prep" documentation, or have a lot of explanation as a set up to this coding challenge. While I am not opposed to a ML or Data Science branch for the Python track, I don't know that I would start with k-means as a first problem, so I'd want some background from you on where you see this problem fitting into the current Python track, and what the supporting documentation/explaination for it would look like.

I also think that the R, Julia, C, JS, Ruby and Go languages (among others) have some pretty powerful tools for both ML and data science, so limiting this problem to a Python-only implementation feels wrong to me. So I think the best strategy would be to discuss this as a more generic practice exercise, rather than a Python concept exercise.

So - I am not saying no outright, but I would like some more details. Looking forward to hearing them. 😄

SSahas commented 2 years ago

Hello @BethanyG,

Thanks for your Response 😄, I am glad you are showing interest, I am not so sure but I will try to resolve this issues as soon as possible , but I cannot implement this algorithm in R, Julia, Ruby, C , JS, Go etc. .. I have not learned this languages. so may be it should be a practice exercise.

SSahas commented 2 years ago

Hey @BethanyG, I think this is hard , this will take time 😅.So i think we should stop this.

BethanyG commented 2 years ago

@SSahas - I'll close for now. But feel free to re-open, should you want to work on this!

exercism / python

K-means clusters using raw python #2999