Closed Kipok closed 3 weeks ago
Example command
ns eval \ --cluster=local \ --server_type=openai \ --model=gpt-4o \ --server_address=https://api.openai.com/v1 \ --benchmarks=answer-judge:0 \ --output_dir=/workspace/NeMo-Skills/test-judgement
Currently it's mostly empty, but let's keep populating the data with complicated examples we find and eventually we will have a good benchmark
Example command
Currently it's mostly empty, but let's keep populating the data with complicated examples we find and eventually we will have a good benchmark