Closed desilinguist closed 4 years ago
@mulhod @bndgyawali @aoifecahill ?
I like the latter option. What happens, though, if both learner.probability
is True
and class_labels
is True
? What should be returned/printed out in that scenario?
@mulhod I don't understand your question, that's the exact weird interaction I was talking about.
Maybe I am misunderstanding your proposal. I thought it was proposing a solution for making sense of the various possibilities.
The latter would mean:
1. If learner.probability is True, always write out probabilities else write out the class labels.
2. If class_labels is True, return class labels else return the class indices.
Above I see what should happen in each case where one thing is True
. Is this supposed to resolve the conflict?
Yes, if you treat them independently as I am proposing, then the above code (where both are True) will work even though it's bit weird. That weird case is the thing that's broken right now. So, it's not 1 or 2, it's 1 and 2.
Oh, I didn't see the return/write out distinction until now.
Yeah, I don't like the in-memory vs. written-out-to-disk representation difference. I think it should be disallowed.
i would also say fix the code so that this work correctly.
. We do not want two things happening in memory and disk.
Fair enough. So:
class_labels
is True
, it will write out AND return class labels irrespective of learner.probability
. class_labels
is False
and learner.probability
is True
, it will write out AND return probabilities. class_labels
is False
and learner.probability
is False
, it will write out AND return class indices. This is not exactly right. We always want to write out class labels and not indices because indices are internal to SKLL. So, 3 is actually:
class_labels
is False
and learner.probability
is False, it will write out class labels AND return class indices.As part of this, I will set the default value of the class_labels
keyword argument for Learner.predict()
to be True
instead of False
since it doesn't make sense to return class indices by default.
💥 This will be breaking change since now to get probabilities as outputs you will explicitly need to set class_labels
to be False
. 💥
There's a bug in
Learner.predict()
that only surfaces via the API for probabilistic learners.Now, if I want to get the most likely labels in memory but write out the class probabilities to disk, I can actually do it the way the API is written:
Doing so actually raises the following error because we try to write out the probabilities even though we end up only computing labels in the code (due to
class_labels
beingTrue
):We should either explicitly disallow this case (it's a little weird to want labels in memory but probabilities on disk) or fix the code so that this work correctly.
The latter would mean:
learner.probability
isTrue
, always write out probabilities else write out the class labels.class_labels
isTrue
, return class labels else return the class indices.Thoughts?