Added flag to re-generate harmfulness eval results. Fixed plotting.

This PR introduces the following:

Added --use_existing_evaluation flag to eval_harmfulness/evaluate_outputs.py allowing for re-generating plots and re-calculating flagged/all ratio. (See you to do it in the updated readme file)
Fixed the order in which models are presented on harmfulness eval plots. (Sorry @Adamliu1, I know it was your task but it was a one-liner, so I added it.)

Adamliu1 / SNLP_GCW