JetBrains-Research / kinference

Running ONNX models in vanilla Kotlin
Apache License 2.0
155 stars 6 forks source link

Mixed backend? #157

Closed CaelumF closed 10 months ago

CaelumF commented 1 year ago

Hi,

I'm trying to run a CLIP model (https://models.attentionai.uk/69977135045d97639861ef8c1af7b751d86e0f20.onnx), which uses convolution.

It works great in the KI backend (I had to implment the new Softmax operator version locally, think I'll make a PR soonish).

My project is multi platform including a web target, and seeing that tfjs is suggested for web targets for performance reasons, I am trying to use this.

However, the TFJS backend doesn't currently support Conv.

I am wondering if it is coherent and sensible to have a backend that uses TFJS where possible, and falls back to the KI backend for operations that aren't supported yet, but still using the fast TFJS backend for future supported operations. I can see that the KI backend is in a commainMain module, so it seems not straight forward to directly try but am curious for your thoughts.

cupertank commented 1 year ago

Hi! Thank you for feedback. I think fallback from TFJS to KI backend is impossible now. These backends have different internal data structures and it's also very expensive to copy data from GPU to CPU (actually we need to copy it twice in this case).

But I think we can implement necessary operators in TFJS backend for this model :) Could you provide me some example how to run this model in Python with ONNXRuntime?

CaelumF commented 1 year ago

Hey, that makes sense. I didn't realise that TFJS keeps the data on the GPU, that's pretty nice actually.

Here's how you can run this model in Python with ONNXRuntime

  1. https://clip-as-service.jina.ai/user-guides/server/#start-a-onnx-backed-server
  2. https://clip-as-service.jina.ai/user-guides/server/#use-custom-model-for-onnx
  3. replace the visual.onnx file with the one linked above (I am 90% sure any models from https://github.com/jina-ai/clip-as-service/blob/c7e84a49a585edfae5fa26b91d302c1ed793f725/server/clip_server/model/clip_onnx.py should work the same)

Then I believe with the config like:


onnx-flow.yml
jtype: Flow
version: '1'
with:
  port: 51000
executors:
  - name: clip_o
    uses:
      jtype: CLIPEncoder
      metas:
        py_modules:
          - clip_server.executors.clip_onnx

you should be able to see a basic web interface on :51000 and upload an image to get back an embedding

dmitriyb commented 10 months ago

@CaelumF hi! We recently added support for the tfjs Conv operator. It is already in the master branch.

CaelumF commented 10 months ago

Awesome, thanks for the update @dmitriyb ! I will close this since the prompting issue is solved, and the idea itself seems infeasible / too much performance overhead to be worthwhile