it is 6 times cheaper to use gpt-4 turbo the newer model with 128k context length than gpt-4-32k
it is also more performant
Changed model temperature as well since high temperature does not make sense. We do not need extract entropy or "creativity" in the response for RAG apps
Description
closes #314