In connection with reviewing for gen AI, a couple quick wording tweaks stood out in the default checklist
[ ] E.1 - this doesn't have the same binary checklist language as the rest. should update to something like Do we have a clear plan to...
E.1 Monitoring and evaluation: How are we planning to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
[ ] D.5 - would suggest updating the name of this to something like Communicate limitations - bias is a major example, but there are others like hallucination that don't clearly fit elsewhere
D.5 Communicate bias: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?
In connection with reviewing for gen AI, a couple quick wording tweaks stood out in the default checklist
Do we have a clear plan to...
Communicate limitations
- bias is a major example, but there are others like hallucination that don't clearly fit elsewhere