Add Transformers logits manipulators

0xymoro commented 10 months ago

Hi - really interesting work. We're currently using HF TGI in production and exploring using this instead, are there plans to add things like typical_p that transformers supports? Would greatly ease the transition. Thanks!

0xymoro commented 10 months ago

In particular typical p in production environments (our 300k users) has proved to create significantly more natural sequences. The Python code is at line 456 of https://github.com/huggingface/transformers/blob/main/src/transformers/generation/logits_process.py and it is a pretty simple entropy calculation & filtering out the high entropy (unpredictable/off the rails) and low entropy (boring and contributing nothing new) tokens.

I see the sampling is done at a much lower level here and it's pretty different but please let me know if I can help in making some PR. I'm not familiar with cuda programming as I am with python but happy to help if there's any way.

juney-nvidia commented 10 months ago

@jerryMeng100

Thanks for sharing the idea.

For sure it is more than welcome for you to make contribution to TensorRT-LLM to add the typical P support. Currently, the community contribution process is(and the process may be iterated and improved based on the concrete feedback we receive):

Community members prepare the MR and do the validation in their local environment.
When the MR is ready, they can ping us for code review(like this one and this one). Dedicated NVIDIA engineers will be assigned to work with the community contributor to merge his or her MR into our internal repo and go through all the internal validation process.
When it is internally validated okay, the community contributed code will be incorporated as part of the next release(either to the main branch or the new release branch) commit(like this one), with explicitly acknowledging the community member's name, also in the release commit, the community member will be mentioned as the co-author. Thus to ensure the community contribution can be respected and acknowledged suitably.

Pls let us know whether it makes sense to you.

Thanks June

NVIDIA / TensorRT-LLM

Add Transformers logits manipulators #241