Since we don't have enough metrics to fill the original categories (faithfulness, localization, etc.), for now I introduce the following simple and in my opinion intuitive separation, also reflected in the text of the paper:
heuristics
downstream task evaluators
ground truth
PLUS:
add tests for sample.py dataset and random.py explainer
remove unused files (felt they were useless, might revert later)
Since we don't have enough metrics to fill the original categories (faithfulness, localization, etc.), for now I introduce the following simple and in my opinion intuitive separation, also reflected in the text of the paper:
PLUS: