DeepLink-org / deeplink.framework

BSD 3-Clause "New" or "Revised" License
55 stars 28 forks source link

deeplink开发不能调用我写的算子 #901

Open jianlonghaha opened 1 month ago

jianlonghaha commented 1 month ago

我在开发deleplink的时候遇到了一个问题一直不能解决,

我在/home/deeplink/deeplink.framework/dipu/third_party/DIOPI/impl里面编译好了共享库libdiopi_impl.so,so里面只实现了这么一段代码

include <diopi/functions.h>

include

include

include "../musa_pytorch.h"

// #include "../common/common.hpp"

namespace impl { namespace musa { static const char name = "MUSADevice"; DIOPI_RT_API const char diopiGetVendorName() { return name; } DIOPI_API diopiError_t diopiAdd(diopiContextHandle_t ctx, diopiTensorHandle_t out, diopiConstTensorHandle_t input, diopiConstTensorHandle_t other, const diopiScalar_t* alpha) { std::cout << "==================diopiAdd===========================\n"; return diopiSuccess; } 编译deeplink之后在打开python之后,不能调用我写的算子,说我回退到cpu端了,如下 import torch,torch_dipu,os dipu device will show as cuda device. if it's not expected behavior, please set env DIPU_PYTHON_DEVICE_ASCUDA=false Wed Jul 17 16:01:46 2024 dipu | git hash:fdc1b4c1-dirty ::diopiFill 01 is not yet implemented, fill.Scalar will be fallback to cpu ::diopiAddScalar 01 is not yet implemented, add.Scalarout will be fallback to cpu ::diopiAddInpScalar 01 is not yet implemented, add.Scalar will be fallback to cpu ::diopiAddInp 01 is not yet implemented, add_.Tensor will be fallback to cpu ::diopiAdd 01 is not yet implemented, add.out will be fallback to cpu ::diopiAdd 01 is not yet implemented, add.Tensor will be fallback to cpu 这怎么解决?

fandaoyi commented 1 month ago

先看下 torch_dipu.so 是否 link了 diopi_impl.so 的动态库 (最新版 dipu 能保证一定 link, 老的版本有一段时间改坏了, 取决于模板编译选项). 然后看下 diopi_impl.so 里 diopiAdd 是否正确的导出了

fandaoyi commented 1 month ago

另外 dipu 里是否已经实现了 MUSA 相关的 接口.

fandaoyi commented 1 month ago

如果你的 diopi 和 dipu 是分别编译的, 编译dipu时 请确定使用 -DWITH_DIOPI_LIBRARY 指定 依赖的 diopi

jianlonghaha commented 1 month ago

你好,我在代码里加了extern "C" 就可以找到我写的算子了,如下,我是看有的厂商是这么做的,我也就这么写了,但是我不知道这是什么原因,哈哈,您能跟我说下是为什么吗?

namespace impl { namespace musa { extern "C" { static const char name = "MUSADevice"; DIOPI_RT_API const char diopiGetVendorName() { return name; } DIOPI_API diopiError_t diopiAdd(diopiContextHandle_t ctx, diopiTensorHandle_t out, diopiConstTensorHandle_t input, diopiConstTensorHandle_t other, const diopiScalar_t* alpha) { std::cout << "==================hello,diopiAdd===========================\n"; return diopiSuccess; } } //extern C } // namespace musa } // namespace impl

jianlonghaha commented 1 month ago

我可能了解了,可能因为c++函数重载的问题,不加extern "C",C++编译器会按照C++的方式进行名称修饰 ,这样因为c++的函数重载的原因会出现问题,加了extern C 函数名在链接时保持原样,这样链接器在链接阶段就不会报出未解析的外部符号错误,哈哈是这样子吗?

fandaoyi commented 1 month ago

diopi 对外导出的符号, 都是要加 extern C 的. 因为要求就是 C API.
不过目前 diopi 项目里的 代码大多是 有一个 adapter 封装, adapter 才是真正用来导出的, 是 c api, 而实际的实现在内部, 是 c++的