RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.94k stars 4.64k forks source link

Catch UndefinedMetric Warning from sklearn #289

Closed tmbo closed 4 years ago

tmbo commented 7 years ago

The warning

/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for) 

confuses a lot of users. We should catch it and replace it with a more useful message.

amn41 commented 7 years ago

yes - don't we already print warnings to this effect when the call validate() on the TrainingData ? In that case it wouldn't even be too bad to just swallow this warning.

tmbo commented 7 years ago

Yes we do, but only if there is only one training sample for an intent if you got 2 training samples the warning will not be printed, but you might still get the sklearn warning. This can happen since the data gets split into multiple CV folds.

We could silence it...

tmbo commented 7 years ago

I had a look and it is not to easy to replace that warning. Nevertheless, we already improved the error messages when validating training data and there is another check now at the classifier level. This warning might still occur though.

KaiKlasen commented 6 years ago

Hi guys,

the warning still occurs and does confuse people. Could you have another look into this issue by re-opening the issue?

tmbo commented 6 years ago

there isn't anything we can do about this. if you know a way to suppress the warning (and only this one) I'd be happy to merge a fix.

antoinecomp commented 6 years ago

I'm working on it but according to the Python3.5 documentation on Temporarily Suppressing Warnings one can suppress the warning using the catch_warnings context manager.

import warnings

def fxn():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    fxn()
tmbo commented 6 years ago

I am happy to reopen. Make sure though not to ignore all warnings but only this one.

wrathagom commented 6 years ago

@antoinecomp take a look at my SO post and see if that gets you what you need to do a PR.

https://stackoverflow.com/a/51738593/1408476

antoinecomp commented 6 years ago

Okay, if that suits everyone I would replace a piece of code in train() of the Trainer class of the sklearn_intent_classifier.py file by:

    def train(self, training_data, cfg, **kwargs):
        # type: (TrainingData, RasaNLUModelConfig, **Any) -> None
        """Train the intent classifier on a data set."""

        num_threads = kwargs.get("num_threads", 1)

        labels = [e.get("intent")
                  for e in training_data.intent_examples]

        if len(set(labels)) < 2:
            logger.warn("Can not train an intent classifier. "
                        "Need at least 2 different classes. "
                        "Skipping training of intent classifier.")
        else:
            occurences = collections.Counter(labels)
            if len(occurences) < 2:
                logger.warn("Can not train an intent classifier. "
                "Need at least 2 different classes. "
                "Skipping training of intent classifier.")
            else:
                lack_examples = [label for label,count in occurences.items() if count<=2]
            if lack_examples:
                warnings.warn("low amounts of training data, can produce un-expected results")
            y = self.transform_labels_str2num(labels)
            X = np.stack([example.get("text_features")
                          for example in training_data.intent_examples])

            self.clf = self._create_classifier(num_threads, y)
            with warnings.catch_warnings():
                warnings.filterwarnings("ignore")
                self.clf.fit(X, y)

With the following dataset with only two examples for the laughing intent it gived me:

(rasaenv) mike@mike-thinks:~/Programming/Rasa/myflaskapp$ python nlu_model.py 
/home/mike/Programming/Rasa/github/rasa_nlu/rasa_nlu/classifiers/sklearn_intent_classifier.py:134: UserWarning: low amounts of training data, can produce un-expected results
  warnings.warn("low amounts of training data, can produce un-expected results")

It needs to import collections andwarnings. If it's okay I'll read how to make a nice push and I'll piss. Feel free to make recommendations, this is the first time I push on another project that mine!

tmbo commented 6 years ago

Mhm that won't really work as your check doesn't cover all the cases: During the crossevaluation the training data is split into multiple parts, sklearn will emit this warning whenever one of the parts doesn't have sufficient training samples. So even if you have 3 examples, you might get the warning if you got unlucky and all these examples ended up in one slice of the data and none in the others.

vishwa94sai commented 5 years ago

For ignoring just UndefinedMetricWarning, we can use the below snippet. It will not suppress other warnings.

with warnings.catch_warnings(): 
    warnings.filterwarnings("ignore", category=UndefinedMetricWarning)
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.