golemfactory / gamerhash-facade

1 stars 2 forks source link

Investigate problems with VLLM running on windows #151

Closed nieznanysprawiciel closed 4 months ago

nieznanysprawiciel commented 4 months ago

Investigate potential problems and list possible approaches.

nieznanysprawiciel commented 4 months ago

Problems:

  1. Problems seems to boil down to lack of windows build of triton dependency I don't see any indicator that there is any fundamental problem. Dev team closed PRs claiming they have no capacity to maintain windows builds (source). It seems that code worked for people engaged in this PR, at least a few months ago.
  2. There are additional checks in code which needs to be disabled (at least vllm, but might be in dependencies as well) Example here but might be more.
  3. Pytorch is installed in cpu version by default. Must be manually substituted for cuda version (source)
  4. It can turn out in the process that other dependencies need our attention as well Other suspicious dependencies (claimed here):
  5. We can't be sure that we won't encounter any runtime problems after creating custom builds
  6. There seems to be small performance penalty on WSL (source)
  7. Custom builds of triton can have performance penalty (source)

Options:

  1. Try to use one of triton unofficial builds for windows done by community There are few options available:
  2. Maintain fork of triton and build packages ourselves There were some successful attempts to do this:
  3. Delegate preparing vllm for windows to external company
  4. Distribute vllm as optional package with GamerHash requiring WSL (user will be asked and warned that installation requires privileges)
  5. Get rid of torch.compile from vllm code torch.compile seems to be code optimization to run faster on cuda. Maybe it is possible to omit this step and accept suffering performance penalty. (Note: it is not the same as running on cpu. Just running not optimized cuda code) @pwalski managed to run whisper (which depends on pytorch) without having problems with triton nor pytorch. That could means, that triton is not really necessary for pytorch.
  6. Give up integrating vllm :(