Closed intouchkun closed 2 years ago
Hi!
This is great! I'll look into it as soon as possible!
Best Regards, Gyuri Kovács
On Sun, 7 Mar 2021 at 17:08, Intouch Kunakorntum notifications@github.com wrote:
Hello György Kovács, I have added a new oversampling technique called 'A Synthetic Minority Based on Probabilistic Distribution (SyMProD)' , which I implemented and published via https://ieeexplore.ieee.org/document/9119990. May you review it and if it has any error or suggestion, please let me know or comment to this pr. thank you.
You can view, comment on, or merge this pull request online at:
https://github.com/analyticalmindsltd/smote_variants/pull/38 Commit Summary
- add_new_oversampling_technique_symprod
File Changes
- M smote_variants/_smote_variants.py https://github.com/analyticalmindsltd/smote_variants/pull/38/files#diff-9d74ebb6d82378d2ae824544d9f38ded62b17f5559027b17d10aeda42103470f (204)
Patch Links:
- https://github.com/analyticalmindsltd/smote_variants/pull/38.patch
- https://github.com/analyticalmindsltd/smote_variants/pull/38.diff
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/analyticalmindsltd/smote_variants/pull/38, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOLICVXQFOU3KKJK7MESQTTCOQJRANCNFSM4YX74H7A .
-- György Kovács, PhD
Email: gyuriofkovacs@gmail.com Phone: +36208000053 Web: http://gykovacs.github.io GitHub: http://github.com/gykovacs
As far as I see in the CI logs (https://travis-ci.com/github/analyticalmindsltd/smote_variants/jobs/488846311), your implementation fails on some edge cases.
In order to ensure that the implemented oversamplers do not break existing machine learning pipelines when they are integrated into them, the techniques are tested with a bunch of edge cases, like very skewed datasets and only a couple of vectors. You can check all the tests here https://github.com/analyticalmindsltd/smote_variants/blob/master/tests/tests.py, or even execute them on your own.
You should think about issues, like: you try to determine the 5 closest neighbors, but your data consists of 3 vectors altogether.
Let me know if you need further help in identifying and fixing the implementation.
Thank you for the information. I'll solve the problems and push again.
Merging #38 (d8b2bcf) into master (dedbc3d) will not change coverage. The diff coverage is
0.00%
.
@@ Coverage Diff @@
## master #38 +/- ##
=======================================
Coverage 0.00% 0.00%
=======================================
Files 3 3
Lines 7413 7522 +109
=======================================
- Misses 7413 7522 +109
Impacted Files | Coverage Δ | |
---|---|---|
smote_variants/_smote_variants.py | 0.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update dedbc3d...d8b2bcf. Read the comment docs.
Thank you! I can see there are still problems with some tests, but this seems to be related to updates in packages I used. Let me hand it over this point, look into it as soon as I can, and update you if there is anything else to do on your end!
Thank you for your contribution so far!
Hi @intouchkun , I have added your method SYMPROD to the package (in a separate PR). I applied some changes to make the implementation a bit more clean, also, I think the inverse transformation of standard scaling was lacking as the last step, to make the new samples comparable to the original ones. It would be great if you could check if my changes are correct and work.
Hello György Kovács, I have added a new oversampling technique called 'A Synthetic Minority Based on Probabilistic Distribution (SyMProD)' , which I implemented and published via https://ieeexplore.ieee.org/document/9119990. May you review it and if it has any error or suggestion, please let me know or comment to this pr. thank you.