Toloka / crowd-kit

Control the quality of your labeled data with the Python tools you already know.
https://crowd-kit.readthedocs.io/
Other
213 stars 16 forks source link

[BUG] MajorityVote() doesn't return what expected #108

Closed LydiaMak closed 4 months ago

LydiaMak commented 4 months ago

Observed behavior

MajorityVote() doesn't return the result of the most workers.

Expected behavior

MajorityVote() to return the result of the most workers.

Python Version

3.11

Crowd-Kit Version

1.2.1

Other Packages Versions

No response

Example code

mv = MajorityVote()
resultmv = mv.fit_predict(df_crowd)

Relevant log output

No response

pilot7747 commented 4 months ago

@LydiaMak Could you provide an example please? I mean an actual df_crowd where this bug reproduces.

LydiaMak commented 4 months ago

Ok it is solved. It was a bug in my code. However, I have a questions. When there is a tie between the different workers, how is the final result selected?

pilot7747 commented 4 months ago

Great to hear!

In case of a tie, MajorityVote returns s proba.idxmax(axis="columns"). So, it will be a determined value, specifically, the first maximum in the columns of a DataFrame that is being return by predict_proba. I'm not sure we guarantee something regarding the order of columns in this DataFrame.