PAIR-code / lit

The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
https://pair-code.github.io/lit
Apache License 2.0
3.46k stars 352 forks source link

error with custom model #43

Open dsvrsec opened 3 years ago

dsvrsec commented 3 years ago

Hi,I am trying to implement LIT on a sentiment model based on imdb dataset in classification.py file. I am not get predictions,on running. `import os

from absl import app from absl import flags from absl import logging from lit_nlp.api import dataset as lit_dataset from lit_nlp.api import types as lit_types import pandas as pd from lit_nlp import dev_server from lit_nlp import server_flags from lit_nlp.components import word_replacer from lit_nlp.examples.datasets import classification from lit_nlp.examples.datasets import glue from lit_nlp.examples.datasets import lm from lit_nlp.examples.models import pretrained_lms from typing import Dict, List, Tuple from keras.datasets import imdb from lit_nlp.api import model as lit_model from lit_nlp.api import types as lit_types from lit_nlp.lib import utils from keras.preprocessing import sequence import numpy as np import tensorflow as tf from keras.models import load_model

import numpy as np FLAGS = flags.FLAGS import string punct=list(string.punctuation) from nltk.tokenize import word_tokenize flags.DEFINE_integer("top_k", 10, "Rank to which the output distribution is pruned.")

flags.DEFINE_integer( "max_examples", None, "Maximum number of examples to load from each evaluation set. Set to None to load the full set." )

flags.DEFINE_bool( "load_bwb", False, "If true, will load examples from the Billion Word Benchmark dataset. This may download a lot of data the first time you run it, so disable by default for the quick-start example." )

FLAGS.set_default("default_layout", "lm")

class IMDbModel(lit_model.Model): """Wrapper for a Natural Language Inference model."""

LABELS = ["0", "1"]

def init(self, model_path, **kw):

Load the model into memory so we're ready for interactive use.

self._model = load_model(model_path, **kw)

LIT API implementations

def preprocess_1(self,text): v=[] if(type(text)==str): words=wordtokenize(text) for w in words: if(w not in punct): w1=w.split('.') for w in w1: w=w.replace('-','') w=w.replace('_','') w=w.replace('.','') w=w.replace(',','') w = ''.join([i for i in w if not i.isdigit()]) if(w!='.'): v.append(w.lower())

return ' '.join(v)

def predict_minibatch(self, inputs): """Predict on a single minibatch of examples.""" word_to_id = imdb.get_word_index() word_to_id = {k:(v+3) for k,v in word_to_id.items()} word_to_id[""] = 0 word_to_id[""] = 1 word_to_id[""] = 2

tmp = []

dict={}
print(inputs)
for ex in inputs:
    tmp = []
    for word in self.preprocess_1(ex['text']).split(" "):
        try:
            tmp.append(word_to_id[word])
        except:
            pass
    tmp_padded = sequence.pad_sequences([tmp], maxlen=500) 
    output=self._model.predict_classes(np.array(tmp_padded[0]))[0][0]
    dict['text']=output
# examples = [self._model.convert_dict_input(d) for d in inputs]  # any custom preprocessing
return dict  # returns a dict for each input

def input_spec(self): """Describe the inputs to the model.""" return { 'text': lit_types.TextSegment(),

'label': lit_types.CategoryLabel(vocab=self.LABELS),

}

def output_spec(self): """Describe the model outputs.""" return {

The 'parent' keyword tells LIT where to look for gold labels when computing metrics.

  'probas': lit_types.MulticlassPreds(vocab=self.LABELS, parent='label'),
}

def main(_):

datasets = { "imdb_train": classification.IMDBData("test"),

Empty dataset, if you just want to type sentences into the UI.

  "blank": lm.PlaintextSents(""),

}

NLIModel implements the Model API

models = { 'model_imdb': IMDbModel('model-path'),

}

lit_demo = dev_server.Server(models, datasets,**server_flags.get_flags()) lit_demo.serve()

if name == "main": app.run(main)`

jameswex commented 3 years ago

Thanks for reaching out.

@kumarvrsec I can attempt to repro your issue if you are able to share your saved model with me.

What exact error are you seeing as well?

dsvrsec commented 3 years ago

Hi,Please find the model and also the error details here.. https://www.dropbox.com/sh/zrjcf6ip2yfb31k/AABwesiswGUddZAFLI7XpX86a?dl=0 and also can you suggest on including embeddings visualization also(which can be obtained from first layer weights from model)

jameswex commented 3 years ago

Thanks for the link.

The first issue is that your predict_minibatch function is not returning a list of dictionaries (1 per input example). It is only returning a single dictionary with a 'probas' field. Additionally, the 'probas' field should be a list of scores for each possible class, not a single int (so something like [{'probas': [0.4, 0.6]}] if predict_minibatch gets a list of only one example as input.

dsvrsec commented 3 years ago

Thanks for the inputs...

I am assuming that input to predict_minibatch function will be list of dictionary of sentence and their corresponding label. Ex: [{'text':'....','label':'...'}] of all sentences bin the data corpus.So,does the output should contain only out probabilities or also contains text sentence.Can you please give one sample format of how output should look like.Thank you again.

jameswex commented 3 years ago

Correct that the input will be a list of dictionaries of {'text": ..., 'label': ...}. That list wont contain all examples in the data corpus. LIT splits the data into batches to be fed to that predict_minibatch function, and at other points in processing, that list might only be a single datapoint long (for instance, if you create a new example in the tool then it will predict on just that new example).

And the output should be a list of the same length of dicts with just the fields specified in your output_spec for the model. So in your case, each dict will just have the 'probas' key with a list of class scores for that example.

[{'probas': [0.4, 0.6]}, {'probas': [0.1, 0.9]}] would be an appropriately-formatted return for when predict_minibatch is passed a list of 2 examples to predict.

dsvrsec commented 3 years ago

Thank you ,I shall try to implement with your suggestions and can you also mention the format to consider the embeddings also(if word2vec or glove are considered)and explanations ,if possible.I looked at examples,but they were all related with pretrained Bert model.

dsvrsec commented 3 years ago

Correct that the input will be a list of dictionaries of {'text": ..., 'label': ...}. That list wont contain all examples in the data corpus. LIT splits the data into batches to be fed to that predict_minibatch function, and at other points in processing, that list might only be a single datapoint long (for instance, if you create a new example in the tool then it will predict on just that new example).

And the output should be a list of the same length of dicts with just the fields specified in your output_spec for the model. So in your case, each dict will just have the 'probas' key with a list of class scores for that example.

[{'probas': [0.4, 0.6]}, {'probas': [0.1, 0.9]}] would be an appropriately-formatted return for when predict_minibatch is passed a list of 2 examples to predict.

I have changed the function as per your suggestions and I am facing this error and unable to see predictions.

/lit-main/lit_nlp/components/lemon_explainer.py", line 90, in output_probs = np.array([output[pred_key] for output in model_outputs]) KeyError: 'probas'

I0913 22:00:36. _internal.py:113] 127.0.0.1 - - [13/Sep/2020 22:00:36] "POST /get_interpretations?model=model_imdb&dataset_name=imdb_train&interpreter=counterfactual%20explainer HTTP/1.1" 500 -

jameswex commented 3 years ago

Can you provide your updated python file with your changes? Thanks.

pratikchhapolika commented 2 years ago

Correct that the input will be a list of dictionaries of {'text": ..., 'label': ...}. That list wont contain all examples in the data corpus. LIT splits the data into batches to be fed to that predict_minibatch function, and at other points in processing, that list might only be a single datapoint long (for instance, if you create a new example in the tool then it will predict on just that new example). And the output should be a list of the same length of dicts with just the fields specified in your output_spec for the model. So in your case, each dict will just have the 'probas' key with a list of class scores for that example. [{'probas': [0.4, 0.6]}, {'probas': [0.1, 0.9]}] would be an appropriately-formatted return for when predict_minibatch is passed a list of 2 examples to predict.

I have changed the function as per your suggestions and I am facing this error and unable to see predictions.

/lit-main/lit_nlp/components/lemon_explainer.py", line 90, in output_probs = np.array([output[pred_key] for output in model_outputs]) KeyError: 'probas'

I0913 22:00:36. _internal.py:113] 127.0.0.1 - - [13/Sep/2020 22:00:36] "POST /get_interpretations?model=model_imdb&dataset_name=imdb_train&interpreter=counterfactual%20explainer HTTP/1.1" 500 -

@dsvrsec can you please provide the working repo?