Open p3nGu1nZz opened 6 months ago
i want to follow along with this but dont know how much i can help ^^
i want to follow along with this but dont know how much i can help ^^
you could make a simple python script to tokenize a string of words (using huggingface transformers) no more than 1000 characters. And track how long it takes to tokenize that string as accurately as possible.
Benchmarking Lexer Against Hugging Face Transformer
Objective:
To evaluate the performance and effectiveness of our custom Lexer in comparison to the Hugging Face transformer, we will create a benchmark that measures speed, memory usage, and the quality of context representation.
Tasks:
transformers
library and dependencies.Acceptance Criteria:
This ticket will guide the development of a comprehensive benchmarking suite that will inform our decision-making process regarding text processing tools within our project.