Open duongkstn opened 1 year ago
There are two choices:
There are two choices:
- Add your logitProcessor into c++ code, this requires to write some cuda kernel.
- Encapsulate the transformer block as a python op. You can refer the GptDecoderOp. This requires to understand the implementation details of FT.
Would adding a logit processor require writing a cuda kernel, or could it be plain C++? e.g. if it's just masking some weigh
All operation in FT are on CUDA.
Hi, I am a python and Huggingface's
transformers
user. My model is based on T5/BART but with further generating function implementaions by myself in python (add moreLogitsProcessor
functions in Huggingface code ). How can I use my functions along withFasterTransformer
properly? It is really hard for me to read C++ code and understand the flow. Any advice ?