GraySwanAI / nanoGCG

A fast + lightweight implementation of the GCG algorithm in PyTorch
MIT License
114 stars 29 forks source link

optim_str tokenization issue #24

Open javyduck opened 2 weeks ago

javyduck commented 2 weeks ago

hi authors, thanks for the great work!

I have a question regarding the tokenization process implemented in the repo. It appears that the before_ids, target_ids, after_ids, and optim_str_ids are tokenized separately. However, when reintegrating optim_str back into the original messages and performing tokenization again, the token IDs for the optim_str segment may differ from those generated when optim_str is tokenized independently, without preceding context.

Would this be fixed in the later version?

Thanks!