Closed khan-yin closed 4 months ago
请问cuda11.4能运行么?
Hi there, Thank you for your contribution ! Have you tested whether the code works under cuda 12 ?
请问cuda11.4能运行么?
I think it will be ok, I will test it futher.
Hi there, Thank you for your contribution ! Have you tested whether the code works under cuda 12 ?
I will test it latter.
please note that //example:test is deprecated and may not reflect the correctness of code. If possible, please consider run
bazel test ... --build_tests_only=1 --test_tag_filters=-manual,-rocm --config=cuda12
for full test under cuda12.
@samaritan1998 @netaddi hello,I have already tested the code with cuda12.2 CUDNN9 and cuda11.4 CUDNN8, it can build successfully and the example test can be passed. For full test under cu12, I always failed with the error which related with memory limited, maybe my environment could not test all this?
@samaritan1998 @netaddi hello,I have already tested the code with cuda12.2 CUDNN9 and cuda11.4 CUDNN8, it can build successfully and the example test can be passed. For full test under cu12, I always failed with the error which related with memory limited, maybe my environment could not test all this?
Hi khan, than you for your patience.
It seems that you did not include critical information in your screen shot, I can not see the error information.
However, if you think this error is caused by lack of system memory, please try add option --jobs=8
which limits the number of concurrent compilation processes.
@samaritan1998 @netaddi hello,I have already tested the code with cuda12.2 CUDNN9 and cuda11.4 CUDNN8, it can build successfully and the example test can be passed. For full test under cu12, I always failed with the error which related with memory limited, maybe my environment could not test all this?
Hi khan, than you for your patience. It seems that you did not include critical information in your screen shot, I can not see the error information. However, if you think this error is caused by lack of system memory, please try add option
--jobs=8
which limits the number of concurrent compilation processes.
thanks a lot! It really helps. Anyway, does all the LLM model weights should be downloaded for test? I am not sure if it is enough for storage.
@samaritan1998 @netaddi hello,I have already tested the code with cuda12.2 CUDNN9 and cuda11.4 CUDNN8, it can build successfully and the example test can be passed. For full test under cu12, I always failed with the error which related with memory limited, maybe my environment could not test all this?
Hi khan, than you for your patience. It seems that you did not include critical information in your screen shot, I can not see the error information. However, if you think this error is caused by lack of system memory, please try add option
--jobs=8
which limits the number of concurrent compilation processes.thanks a lot! It really helps. Anyway, does all the LLM model weights should be downloaded for test? I am not sure if it is enough for storage and a long time for git lfs pull.
@samaritan1998 @netaddi hello,I have already tested the code with cuda12.2 CUDNN9 and cuda11.4 CUDNN8, it can build successfully and the example test can be passed. For full test under cu12, I always failed with the error which related with memory limited, maybe my environment could not test all this?
Hi khan, than you for your patience. It seems that you did not include critical information in your screen shot, I can not see the error information. However, if you think this error is caused by lack of system memory, please try add option
--jobs=8
which limits the number of concurrent compilation processes.thanks a lot! It really helps. Anyway, does all the LLM model weights should be downloaded for test? I am not sure if it is enough for storage.
No. I think it would be fine if all unit tests (...
) are passed. We have internal verification steps that will run after pull request is approved.
Hello, I have already 60/74 tests, all the src op test is passed, the rest 14 cases details are below:
FAILED:
//maga_transformer/async_decoder_engine/test:async_model_test
//maga_transformer/async_decoder_engine/test:decoder_engine_test
//maga_transformer/async_decoder_engine/test:rpc_model_test
//maga_transformer/cpp/test:gpt_model_test
//maga_transformer/models/test:llama_test
//maga_transformer/server/test:inference_worker_test
//maga_transformer/test:async_gather_batch_test
//maga_transformer/test:slice_stop_word_list_test
//maga_transformer/test:template_test
//maga_transformer/utils/test:ckpt_database_test
//maga_transformer/utils/test:incremental_decode_test
//maga_transformer/utils/test:model_weights_loader_test
TIMEOUT:
//src/fastertransformer/devices/cuda_impl/tests:cuda_dist_test
FAILED in src:
//src/fastertransformer/devices/cuda_impl/tests:cuda_attention_op_test
details:
src/fastertransformer/devices/cuda_impl/tests/ops/CudaAttentionOpTest.cc:457: Failure
Value of: static_cast<CudaDevice*>(device_)->use_multi_block_mode
Actual: false
Expected: true
[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (4723 ms total)
[ PASSED ] 6 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] CudaAttentionOpTest.MultiBlockSelfAttentionOpTest
[ FAILED ] CudaAttentionOpTest.LongSeqMultiBlockSelfAttentionOpTest
Possible Reasons:
tokenizer.model
loading it is ok. For Example:
$git lfs pull --include="*.model"
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
dist_test
.Thanks khan, we have verified the correctness of your code and merged your pull request.
Thanks khan, we have verified the correctness of your code and merged your pull request.
Thank you!Maybe I could apply for collaborator to provide more contributions, I am interested in MLsys and C++/CUDA recently.🤣
fix(src): fix bazel build special type cast and template match for cuda118
Local Environment
tar.gc
in releasev0.2.0Build Result
bazel build
bazel build test
whl