Implement custom tokenizer trouble

QData / TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

https://textattack.readthedocs.io/en/master/

MIT License

2.84k stars 383 forks source link

Implement custom tokenizer trouble #724

Open zhy605420954 opened 1 year ago

zhy605420954 commented 1 year ago

Hi I want to implement use textattack to attack my lstm model for detect vulnerability in python code. So I need to use custom tokenizer to tokenize python source code.Could you please tell me how to implement a custom tokenizer? Really thank you!

jxmorris12 commented 1 year ago

Hi! Textattack relies on an abstraction we call model wrappers which take text and output the model predictions. So you need to write a custom model wrapper that calls your custom tokenizer. It should look very similar to the existing model wrappers.