What is the purpose of split text with `!@#$%^&*()`?

fahadh4ilyas commented 1 month ago

This split is only happened when there is exactly that substring inside the string. If you do this split, only this is what happened:

text = 'Here is a text! This text have exclamation mark'
print(text.split("!@#$%^&*()"))
# ['Here is a text! This text have exclamation mark']

I guess the intention is this?

text = 'Here is a text! This text have exclamation mark'
print(do_split(text))
# ['Here is a text', ' This text have exclamation mark']

fahadh4ilyas commented 1 month ago

Nevermind, it's used for E5 datasets. I thought it's a random substring.

vaibhavad commented 1 month ago

It is to separate out instruction tokens and sentence tokens, as while mean pooling, also sentence tokens are considered.

McGill-NLP / llm2vec