This sample will work best on a system that has a GPU, but it can be used on a system without one if necessary. As we work on setup instructions for this app, we should include details for getting it to work on these systems with no GPU. The following are the steps I had to take so far to get it working and I'll keep adding to this until I can use all the sample use cases:
Update the Ollama bootstrapping code so it does not use a GPU. In the AppHost project, Program.cs change this code:
var chatCompletion = builder.AddOllama("chatcompletion").WithDataVolume();
to
var chatCompletion = builder.AddOllama("chatcompletion", enableGpu: false).WithDataVolume();
In the PythonInference project, change requirements.txt to use versions of Torch libraries without CUDA. Change it from:
When running without a GPU, responses from the models will be slower and default settings for timeouts are not enough. I had to update my ServiceDefaults project, Extensions.cs file to increase the StandardResilience timeouts. The following worked for me, but on some systems a different timeout may be needed. In Extensions.AddServiceDefaults, change the StandardResilienceHandler from:
This sample will work best on a system that has a GPU, but it can be used on a system without one if necessary. As we work on setup instructions for this app, we should include details for getting it to work on these systems with no GPU. The following are the steps I had to take so far to get it working and I'll keep adding to this until I can use all the sample use cases:
to
to:
to:
to: