autowarefoundation / modelzoo

A collection of machine-learned models for use in autonomous driving applications.
https://www.autoware.org/
Apache License 2.0
62 stars 24 forks source link

CenterPoint Backbone preprocessing optimization #83

Open angry-crab opened 1 year ago

angry-crab commented 1 year ago

The current implementation of scatter has some limitation.

  1. the GPU implementation hard coded iterator bindings which might not work for certain devices. For example, for OpenCL backend, if a GPU has only one dimension global work size.

        for j in T.thread_binding(0, 560, thread = "blockIdx.x"):
            for k in T.thread_binding(0, 560, thread = "blockIdx.y"):
                for i in T.thread_binding(0, 32, thread = "threadIdx.x"):
  2. There is no room for optimization because of hard code. Normally, we need to create schedule from IRModule and define optimization strategies.

  3. Need to create a optimization schedule and measure its performance.

angry-crab commented 1 year ago

1 can be solved by implementing scatter from a top level, ie TE or Relay.