Closed SSahas closed 2 years ago
🤖 🤖
Hi! 👋🏽 👋 Welcome to the Exercism Python Repo!
Thank you for opening an issue! 🐍 🌈 ✨
◦ If you'd also like to make a PR to fix the issue, please have a quick look at the Pull Requests doc.
We 💙 PRs that follow our Exercism & Track contributing guidelines!
Please feel free to submit a PR, linking to this issue.
🎉
‼️ Please Do Not ‼️ ❗ Run checks on the whole repo & submit a bunch of PRs. This creates longer review cycles & exhausts reviewers energy & time. It may also conflict with ongoing changes from other contributors. ❗ Insert only blank lines, make a closing bracket drop to the next line, change a word to a synonym without obvious reason, or add trailing space that's not an[ EOL][EOL] for the very end of text files. ❗ Introduce arbitrary changes "just to change things" . _...These sorts of things are **not** considered helpful, and will likely be closed by reviewers._ |
💛 💙 While you are here... If you decide to help out with other open issues, you have our gratitude 🙌 🙌🏽.
Anything tagged with [help wanted]
and without [Claimed]
is up for grabs.
Comment on the issue and we will reserve it for you. 🌈 ✨
Hi @SSahas 👋🏽
Thanks for filing this issue, and for stepping forward to (possibly) design an exercise for Exercism!
TL;DR
: Specifications for practice exercises and specifications for concept exercises. Additionally, we use pytest
as a runner for the track, so all tests would need to use unttest
syntax, and be runnable via pytest
. For additional considerations, see the Python Contributing Docs.
While having an algorithm implementation like this might be interesting, I do have some concerns:
This is an implementation of the algorithm with sample data, but to be meaningful to students we've found that solving a specific problem is more engaging and leads to better learning. K-means
can be used for spam filtering, fraud detection, audience segmentation, signal processing, image segmentation, and recommendations - among other things. What problem would you center this on, and what would the data and problem statement for it look like?
This isn't "pure Python" or "raw Python" in a "traditional" sense -- your implementation uses Numpy
, Pandas
, Jupyter
and Matplotlib
. That isn't bad -- but it does mean the use of libraries beyond the Python standard lib. Since current exercism tooling for the website only supports core Python, we'd need to do some work to support the loading of external libraries such as numpy
and pandas
. And even with that work, we wouldn't support the use of Jupyter Notebooks
or JupyterLab
(they include a whole web stack and other complex considerations), and might not be able to support matplotlib
in a very effective way, due to its visual nature.
In addition to website tooling, we have the issue of walking students through what they would need to set up to work on the problem via the CLI. There are certainly ways to do this, but it is additional work.
Running the steps of K-means
repeatedly to reach optimum partitioning may not fit within the performance needs of our current platform. We'd need code and tests to execute in a maximum of ~10s before timing out. There are also the cases where a k-means
implementation never reaches optimum, so we need to be careful of that in the construction of the data set.
K-means partitioning is not deterministic. Outcomes vary depending on the amount and position of the starting centroids and the number of iterations the algorithm goes through. That presents some challenges for student verification, testing, and feedback. I'd want to see what tests looked like for this problem, and run them over multiple solutions before we released anything on the platform/to students.
As it stands now, this isn't a programming or algorithm challenge as much as it is one of deciding how to apply or tune k-means
. It also feels as though we'd have to point students at a lot of "prep" documentation, or have a lot of explanation as a set up to this coding challenge. While I am not opposed to a ML
or Data Science
branch for the Python track, I don't know that I would start with k-means
as a first problem, so I'd want some background from you on where you see this problem fitting into the current Python track, and what the supporting documentation/explaination for it would look like.
I also think that the R
, Julia
, C
, JS
, Ruby
and Go
languages (among others) have some pretty powerful tools for both ML
and data science
, so limiting this problem to a Python-only implementation feels wrong to me. So I think the best strategy would be to discuss this as a more generic practice exercise, rather than a Python concept exercise.
So - I am not saying no outright, but I would like some more details. Looking forward to hearing them. 😄
Hello @BethanyG,
Thanks for your Response 😄, I am glad you are showing interest, I am not so sure but I will try to resolve this issues as soon as possible , but I cannot implement this algorithm in R, Julia, Ruby, C , JS, Go etc. .. I have not learned this languages. so may be it should be a practice exercise.
Hey @BethanyG, I think this is hard , this will take time 😅.So i think we should stop this.
@SSahas - I'll close for now. But feel free to re-open, should you want to work on this!
Hello, I am Sahas. I wanna create a python problem which is implementing k-means clusters algorithm using raw python. please tell me if this is considerable or too much for an exercise or it requires any changes.
K- means Algorithm : The K-means clustering algorithm computes centroids and repeats until the optimal centroid is found. It is presumptively known how many clusters there are. It is also known as the flat clustering algorithm. The number of clusters found from data by the method is denoted by the letter 'K' in K-means.
.
The code is available here :
SSahas / exercism-problem
The code Mainly uses :