Maybe there is a better name for "common pitfalls", but the idea is to explain that:
Many metrics require ground truth labels for the dataset, i.e. "adversarial indices" (for training samples from a different dataset), or "noisy label indices". A lot of the time, users don't have access to those -> so we recommend either using one of our prepared benchmark "suites" or creating their own benchmark for explainer comparison. Benchmarks allow to create a controlled setting, etc.
The "cache_id" and "model_id" arguments passed to different class instances are used to saved intermediary results. It is important to make sure that when working with multiple models and datasets, each of the experiments get a unique combination of these arguments. Otherwise, the user is running the risk of wrong intermediary results being reused.
Some explainers can cause OutOfMemory issues (TracInCPRandomProj, for example), when applied to dataset and/or models of significant size. We recommend - what??
Maybe there is a better name for "common pitfalls", but the idea is to explain that: