GaryGuTC / LaPA_model

[CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
12 stars 0 forks source link

Problems with ans_tokenizer_dict.pkl content for VQA_RAD dataset #6

Open Shmily17 opened 4 months ago

Shmily17 commented 4 months ago

Hello

I think you are doing a fantastic job! However, I am now having some problems reproducing your experimental results, the problem reads as follows: i am reproducing the VQA_RAD dataset very poorly, and after checking it I found that

The labels of ans_tokenizer_dict.pkl of the VQA_RAD dataset you provided do not correspond to the content labels of the VQA_RAD dataset, could you please provide how ans_list.pkl is encoded into ans_tokenizer_dict.pkl, or I hope that you can provide the code for the conversion.

Thank you very much! Have a nice life!

GaryGuTC commented 4 months ago

Hi

Thanks so much for your attention.

  1. Sorry for that loss code for them; the process only includes saving and transferring. The method for generating ans_tokenizer_dict : save all answer tokens, including training, validation, and test sets. Then, tokenize them and make sure each answer tokens corresponds to each item.
  2. Download the VQA_RAD dataset by following the README file.

Best regards,

Shmily17 commented 4 months ago

First of all, thank you for your patience! I still have a small question, can you tell me how Tokenizer is done? This is because the output of a normal BERT tokenizer is not the same as the content of the ans_tokenizer_dict key provided. (ans_tokenizer_dict looks like a specific Tokenizer method is used)

Thank you very much!

GaryGuTC commented 4 months ago

Hi Please try to use the tokenizer of the Roberta-base, which is the same as the LaPA's language part.

Best regards,

Shmily17 commented 4 months ago

Thank you so much for your patience! Have a great day!

0linzhi commented 2 months ago

Hi

Thanks so much for your attention.

  1. Sorry for that loss code for them; the process only includes saving and transferring. The method for generating ans_tokenizer_dict : save all answer tokens, including training, validation, and test sets. Then, tokenize them and make sure each answer tokens corresponds to each item.
  2. Download the VQA_RAD dataset by following the README file.

Best regards,

Please,How do you get the dimension (1,2978) of ans_tokenizer? I input ans_list with a length of 3579 into the tokenizer and can only get a tensor of (3579,27). (max_length=27). Please tell me how to set the formal parameters of tokenizer when encoding ans_list to get ans_tokenizer with dimension (1,2978).