JuliaAI / Imbalance.jl

A Julia toolbox with resampling methods to correct for class imbalance.
https://juliaai.github.io/Imbalance.jl/dev/
MIT License
28 stars 1 forks source link

Trying to replicate the quick start example (JOSS review 2) #89

Closed ArneTillmann closed 6 months ago

ArneTillmann commented 6 months ago

When I tried to replicate the example from the introduction the google colab instructions were very helpful for me as I have never used Julia so far. However I could not replicate the example. The code threw two errors:

Pasted image 20240229155356

After commenting out line 3

Pasted image 20240229155314

I suggest you rerun the examples including those in the tutorial on a new machine to check whether they work as presented if you haven't done so already. I will double check if you give the go.

EssamWisam commented 6 months ago

Thank you for reporting this issue.

I'm quite certain that I ran all the tutorials and the introduction before publishing them in the documentation. It's also why they are presented with the output shown.

Colab support to Julia is a little iffy (not officially supported and running it requires a hacky solution) but I would be happy to see the specific notebook from which you extracted the screenshots. Can you share it in view only mode and I will take a copy and see what could have went wrong?

EssamWisam commented 6 months ago

Now I understand that you were trying to replicate the introduction but using TableTransforms instead. The fact is that you can't be using multiple interfaces at once without specifying from which the model will come (e.g., both MLJ and TableTransforms) as the same model name is used for each.

ArneTillmann commented 6 months ago

Here is the notebook.

EssamWisam commented 6 months ago

Okay as I expected you need your last cell to be

using Imbalance
using TableTransforms

# Generate imbalanced data
num_rows = 200
num_features = 5
y_ind = 3
Xy, _ = generate_imbalanced_data(num_rows, num_features;
                                 class_probs=[0.5, 0.2, 0.3], insert_y=y_ind, rng=42)

# Initiate SMOTE model
oversampler = Imbalance.TableTransforms.SMOTE(y_ind; k=5, ratios=Dict(0=>1.0, 1=> 0.9, 2=>0.8), rng=42)
Xyover = Xy |> oversampler       # can chain with other table transforms
# equivalently if TableTransforms is used
Xyover, cache = TableTransforms.apply(oversampler, Xy)    # equivalently

Since you are also using MLJ and the same model name is used in the TableTransforms interface for consistency. The assumption here is that the two interfaces are meant to be serving different audiences; it could quite odd for someone using MLJ in their project to prefer using the table transforms interface over the MLJ interface. Under the unliekly occassion that someone uses both, the error should signify that there is a crash in the names and I will also add in the README/main page in the documentation a hint for this.

jbytecode commented 6 months ago

Due to the JOSS submission

https://github.com/openjournals/joss-reviews/issues/6310

ArneTillmann commented 6 months ago

Thank you for clarifying

EssamWisam commented 6 months ago

I may add that it's best for you to try the algorithms offline. Since Colab for Julia is afterall a hacky approach (and too slow), I will likely jettison it from the docs to avoid confusing new users. Will then have it back should support to Julia be ever official.