WIP - Llama code interpreter and improved LLM as a judge

This is still work-in-progress, but want to push to allow other people to provide suggestions and reuse the code while we are ironing out the details.

The current implementation does not have a proper integration for llm-as-a-judge - this requires a larger refactoring which I will try to do next week.

It also breaks openmathinstruct-1 backward compatibility which is probably fine since people can always switch back to the v0.1.1 tag and we can't support that code forever as we move to the new format with llama models

Kipok / NeMo-Skills

WIP - Llama code interpreter and improved LLM as a judge #89