Open TED-EE opened 1 month ago
even run reference on cpu, the test program creates input tensors on CUDA first and then casts them to cpu. so cuda driver is needed here. btw, without cuda driver, you cannot run triton kernel either.
even run reference on cpu, the test program creates input tensors on CUDA first and then casts them to cpu. so cuda driver is needed here. btw, without cuda driver, you cannot run triton kernel either.
https://xie.infoq.cn/article/9ca517ab55eaf60361ed11889 -> "FlagGems 算子库完成后,大模型的开发者和使用者可以仅用一行代码将 ATen 算子替换为 FlagGems,便捷地部署到英伟达 GPU 或其他 AI 芯片上,而无需考虑代码修改或后端适配等问题。"
That means, we can run triton kernel on another AI platform other than CUDA GPU. Supposed I create input tensors and run reference on other AI platform, the existing code:
test/python/triton/third_party/FlagGems/tests/test_unary_pointwise_ops.py:
def test_accuracy_abs(shape, dtype):
inp = torch.randn(shape, dtype=dtype, device="cuda")
seems not working properly because device="cuda" is hardcode. The only way I can figure out is modified "cuda" to the device name of other AI platform, which means I will modified the code: device="cuda"
to device="another_device"
of all the tests in FlagGems. Quite tedious and low scalability, think about everytime the user update the latest repo and replace all device="cuda"
to device="another_device"
.
The solution I considered is modified conftest.py, add choices=["cuda", "cpu", "another_device"]
and adapt it (like TO_CPU). Ideally, pytest test_unary_pointwise_ops.py::test_accuracy_abs[dtype0-shape0] --device another_device
will work. However, the problem arised as mentioned in the very beginning of https://github.com/FlagOpen/FlagGems/issues/129, which is RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
because of hardcode of device="cuda"
.
we developers can not make sure what platform users run tests on and what device name will be used in tensor initialization. for the most common case, we initialize tensors on "cuda". on other ai platforms, replacing "cuda" with specific device name is welcomed. here is an example from branch cambricon: https://github.com/FlagOpen/FlagGems/blob/cambricon/tests/test_unary_pointwise_ops.py#L20, in which "mlu" can be chosen by pytest option.
will look into it, thanks a lot
@StrongSpoon The branch cambricon seems briliant, but another question is not fixed, think about everytime the developer pull the lastest repo, they need to solve the conflict of
inp = torch.randn(shape, dtype=dtype, device="cuda")
and
inp = torch.randn(shape, dtype=dtype, device=DEVICE)
also quite tedious and low scalability.
Many AI platform companies other than NIVIDA may utilize FlagGems. It may be a better solution to expose the device=DEVICE
for the developer and assign "cuda" by default if the developer does not specify --device option, instead of hardcoding of device="cuda". In this approach, the non-CUDA developers can run the tests without solving conflict everytime they pull latest FlagGems repo, only add their AI device name in choices of the conftest.py.
thanks for your advice. we'll apply other device options into master branch after migration is finished.
cheers
I'm wondering, since we still use torch.mlu.synchronize() can we write a wrapper or an abstract adaptor?
may you describe in detail?
Thanks for the help of issue #126 . I got a question when I tried to run reference on cpu without CUDA. The reproduce steps are as follows,
Requirements
Codebase
Installation:
Run reference on cpu
Results
It makes me confused that when I tried to run the reference on cpu, actually no CUDA device is needed.