Closed tmbo closed 4 years ago
yes - don't we already print warnings to this effect when the call validate()
on the TrainingData
?
In that case it wouldn't even be too bad to just swallow this warning.
Yes we do, but only if there is only one training sample for an intent if you got 2 training samples the warning will not be printed, but you might still get the sklearn warning. This can happen since the data gets split into multiple CV folds.
We could silence it...
I had a look and it is not to easy to replace that warning. Nevertheless, we already improved the error messages when validating training data and there is another check now at the classifier level. This warning might still occur though.
Hi guys,
the warning still occurs and does confuse people. Could you have another look into this issue by re-opening the issue?
there isn't anything we can do about this. if you know a way to suppress the warning (and only this one) I'd be happy to merge a fix.
I'm working on it but according to the Python3.5 documentation on Temporarily Suppressing Warnings one can suppress the warning using the catch_warnings context manager.
import warnings
def fxn():
warnings.warn("deprecated", DeprecationWarning)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
fxn()
I am happy to reopen. Make sure though not to ignore all warnings but only this one.
@antoinecomp take a look at my SO post and see if that gets you what you need to do a PR.
Okay, if that suits everyone I would replace a piece of code in train()
of the Trainer
class of the sklearn_intent_classifier.py file by:
def train(self, training_data, cfg, **kwargs):
# type: (TrainingData, RasaNLUModelConfig, **Any) -> None
"""Train the intent classifier on a data set."""
num_threads = kwargs.get("num_threads", 1)
labels = [e.get("intent")
for e in training_data.intent_examples]
if len(set(labels)) < 2:
logger.warn("Can not train an intent classifier. "
"Need at least 2 different classes. "
"Skipping training of intent classifier.")
else:
occurences = collections.Counter(labels)
if len(occurences) < 2:
logger.warn("Can not train an intent classifier. "
"Need at least 2 different classes. "
"Skipping training of intent classifier.")
else:
lack_examples = [label for label,count in occurences.items() if count<=2]
if lack_examples:
warnings.warn("low amounts of training data, can produce un-expected results")
y = self.transform_labels_str2num(labels)
X = np.stack([example.get("text_features")
for example in training_data.intent_examples])
self.clf = self._create_classifier(num_threads, y)
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
self.clf.fit(X, y)
With the following dataset with only two examples for the laughing intent it gived me:
(rasaenv) mike@mike-thinks:~/Programming/Rasa/myflaskapp$ python nlu_model.py
/home/mike/Programming/Rasa/github/rasa_nlu/rasa_nlu/classifiers/sklearn_intent_classifier.py:134: UserWarning: low amounts of training data, can produce un-expected results
warnings.warn("low amounts of training data, can produce un-expected results")
It needs to import collections
andwarnings
.
If it's okay I'll read how to make a nice push and I'll piss. Feel free to make recommendations, this is the first time I push on another project that mine!
Mhm that won't really work as your check doesn't cover all the cases: During the crossevaluation the training data is split into multiple parts, sklearn will emit this warning whenever one of the parts doesn't have sufficient training samples. So even if you have 3 examples, you might get the warning if you got unlucky and all these examples ended up in one slice of the data and none in the others.
For ignoring just UndefinedMetricWarning, we can use the below snippet. It will not suppress other warnings.
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=UndefinedMetricWarning)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.
The warning
confuses a lot of users. We should catch it and replace it with a more useful message.