The task is there, but it fails to run because rouge is missing. I believe the metric is in Unitxt, just need to fix the pointer in the catalog.
Then we need to add an example that read ClapNQ from HF, samples k examples from it, creates temporary datasets of benchmark and documents, and runs RAG. There is the example that @yoavkatz shared in Slack - we can build on top of that. Or use ChromaDB as an in-memory vector-store.
I created a branch: and PR in draft state - https://github.com/IBM/unitxt/pull/953