Closed terraformmachine closed 1 year ago
@terraformmachine thanks for the colab with the reproduction of the issue!
One issue I noticed is that when you load your dataset, your 'label' field for each example is an integer for the label class ID, but you want it to be a string from your vocab for use in LIT. In the future, we are looking to add dataset and model field validation to catch and indicate these types of issues on first launch instead of having unexpected issues when using LIT.
If you change your setting of 'label' in the dataset str(row['label']
, then LIT will correctly see the ground truth labels. Then the metrics and classification results modules will display correctly, as opposed to some issues they had.
But, I don't think that actually fixes your TCAV issue. To debug that, I am currently running the TCAV interpreter directly in the colab as opposed to through the UI. That way, I can see on what line of code the TCAV failed. Here is the code I use for that:
from lit_nlp.components import tcav
from lit_nlp.lib import caching
from lit_nlp.api.dataset import IndexedDataset
# LIT under the covers wraps the dataset in IndexedDataset and the model in CachingModelWrapper, so doing that here in order to use them in the TCAV interpreter.
indexed_datasets = IndexedDataset.index_all(datasets, caching.input_hash)
cached_model = caching.CachingModelWrapper(models["distilbert-base-uncased-emotion"], "distilbert-base-uncased-emotion")
# Run TCAV
ids = ["ab7570637ea93bafe16a910cbb09b798", "f09151abc3763045e779602009fc1428", "a7796da2aca7a74702762877b9506e0b", "b76b3adaa5b65f2cf0072cdd15b38103",
"e1c57d1a02d7f2e40225758bce7cfa9f", "be9b032d394de752b85481499168f8aa" , "ded6753ba0e8021ba490a36b082e9971",
"125065a3ebb6fa7cba26fbe048c53532", "a133362931f8837b9164da8cd5dde98b", "7e19494ccea7be91e1aa818192bbf763", "b5152500f9b9fe595e7c183605c21132"]
config = {
'class_to_explain': "0",
'concept_set_ids': ids,
'dataset_name': "emotion",
'grad_layer': "cls_grad",
}
t = tcav.TCAV()
t.run_with_metadata(indexed_datasets["emotion"].indexed_examples, cached_model, indexed_datasets["emotion"], config=config)
I was able to get TCAV working for your model once I also updated the grad_class
output from your predict method to also return the string of the class instead of the integer index. I also uncommented-out the optional input of grad_class
in the model's input spec.
Please let me know if this works for you.
@jameswex that fixed it! thank you very much for your help :smiley:
I'm getting an unknown error when running TCAV on a multiclass dataset.
Steps to reproduce:
Unknown Error :scream:
Dataset class:
Model class: