Tokenizer in different code version

Hi~ Thanks a lot for the new version code which have made the framework much easier to understand. But I noticed that some details have also changed, e.g., the tokenizer part:

old version:

def tokenizer_X_token(prompt, tokenizer, X_token_index, return_tensors=None):
    prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split(f'<{X_INDEX_TOKEN[X_token_index].lower()}>')]
    ...

new version:

def tokenizer_image_token(prompt, tokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None):
    prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split('<image>')]
    ...

Should I worry about any performance degradation? Since

it looks like the video and image are treated as the same?
the original training samples include symbols like \\n and \n\

In fact, I am trying to finetune with new modals like audio and depth, so is there any confict with current version (besides the languabind part)?

Thank you so much~☺

PKU-YuanGroup / Video-LLaVA

Tokenizer in different code version #125