dimlight13 / MU_SC_for_VQA

Multi-User Semantic Communication for Visual Question Answering
MIT License
14 stars 2 forks source link

Problems encountered during reproduction #1

Closed 13yyy closed 11 months ago

13yyy commented 11 months ago

Hello, I am very interested in your work "https://github.com/dimlight13/MU_SC_for_VQA", but I can't find the paper "Multi-User Semantic Communication for Visual Question Answering" and the corresponding author of the paper"Zhiwei Xu, Yuxuan Song, Yongfeng Huang, and Shengyang Dai" you mentioned , I would like to ask you whether this is the work of your team. If so, I would like to ask you about the source of this paper. Could you please send me the paper"Multi-User Semantic Communication for Visual Question Answering"? In addition, I saw a WeChat public account push “”. In this push, the code address of the paper "Task-oriented multi-user semantic communications for VQA"is your blog "https:// github.com/dimlight13/MU_SC_for_VQA", the network structure of the paper "Task-oriented multi-user semantic communications for VQA" is also consistent with the code you wrote, but your work "https://github.com/dimlight13/MU_SC_for_VQA" mentions that the code reproduced is the code of this paper "Multi-User Semantic Communication for Visual Question Answering". I would like to ask if the paper "Multi-User Semantic Communication for Visual Question Answering" and the paper "Task-oriented multi-user semantic communications for VQA" you mentioned are the same paper?

Finally, I'm very concerned about a question. In your work ,you mentioned " I figured out the reason why the training wasn't performing properly. Data are unevenly distributed. Specifically, there are less than 200 specific labels ( answer), but more than 100,000 specific labels.This causes serious imbalance and training bias." I don't understand your meanings. My understanding is that there are few answer labels (only 200), but there are more than 100,000 specific question labels. But I debugged and checked the lengths of “word_dic” and “answer_dic” in the code. The length of “word_dic” is 89 and the length of “answer_dic” is 28 of the ”CLEVR_train_questions.json” after preprocessing. I wonder if it has any connection with what you said "200 specific labels (answer), but more than 100,000 specific labels" ? I am very confused about this question and would like to ask you this question.

Finally, can I add a contact information like “wechat” in China to ask you for advice? I know this may be a bit presumptuous. Looking forward to your reply, thank you for your help!

dimlight13 commented 11 months ago

Hello. I can gladly give you the answer for some of your questions.

Q1. you whether this is the work of your team A1. No, I do not. The paper I implemented is from IEEE communication letters, which is not open access, therefore sending it to you directly would be problematic. Instead, because it's a paper on ARXIV with the same material, it's fine to look at it there. The URL is provided below. https://arxiv.org/abs/2108.07357

Q2. I would like to ask if the paper "Multi-User Semantic Communication for Visual Question Answering" and the paper "Task-oriented multi-user semantic communications for VQA" you mentioned are the same paper? A2. Perhaps I rushed through writing the README and made an error somewhere along the line. Thank you for informing me. To answer your query, it is the same paper.

Q3. The length of “word_dic” is 89 and the length of “answer_dic” is 28 of the ”CLEVR_train_questions.json” after preprocessing. I wonder if it has any connection with what you said "200 specific labels (answer), but more than 100,000 specific labels" ? I am very confused about this question and would like to ask you this question. A3. I found that certain responses were severely biased while debugging, so I documented it in the README.md. However, I've been working on other projects since then, so I haven't looked into it more, so there could be another reasons that I haven't discovered that are causing it to fail to learn. To answer your question, I don't believe it has anything to do with the word_dic and answer_dic lengths.

Q4. an I add a contact information like “wechat” in China A4. It's strage. I don't have wechat account. So, i can't provide you my account.

Please excuse me if my response is incomplete because I haven't researched this multi-modal topic for very long. I hope you find my responses useful.


Joonho Seon(Ph.D. Candidate) Communications and Artificial Intelligence Lab. Kwangwoon University, SEOUL, KOREA Tel.: +82-2-940-5567 Phone: +82-10-4576-2987 E-mail: @.***


-----Original Message----- From: @.> To: @.>; Cc: @.***>; Sent: 2023-12-05 (화) 18:18:50 (GMT+09:00) Subject: [dimlight13/MU_SC_for_VQA] Problems encountered during reproduction (Issue #1)

Hello, I am very interested in your work "https://github.com/dimlight13/MU_SC_for_VQA", but I can't find the paper "Multi-User Semantic Communication for Visual Question Answering" and the corresponding author of the paper"Zhiwei Xu, Yuxuan Song, Yongfeng Huang, and Shengyang Dai" you mentioned , I would like to ask you whether this is the work of your team. If so, I would like to ask you about the source of this paper. Could you please send me the paper"Multi-User Semantic Communication for Visual Question Answering"?

In addition, I saw a WeChat public account push “”. In this push, the code address of the paper "Task-oriented multi-user semantic communications for VQA"is your blog "https:// github.com/dimlight13/MU_SC_for_VQA", the network structure of the paper "Task-oriented multi-user semantic communications for VQA" is also consistent with the code you wrote, but your work "https://github.com/dimlight13/MU_SC_for_VQA" mentions that the code reproduced is the code of this paper "Multi-User Semantic Communication for Visual Question Answering". I would like to ask if the paper "Multi-User Semantic Communication for Visual Question Answering" and the paper "Task-oriented multi-user semantic communications for VQA" you mentioned are the same paper? Finally, I'm very concerned about a question. In your work ,you mentioned " I figured out the reason why the training wasn't performing properly. Data are unevenly distributed. Specifically, there are less than 200 specific labels ( answer), but more than 100,000 specific labels.This causes serious imbalance and training bias." I don't understand your meanings. My understanding is that there are few answer labels (only 200), but there are more than 100,000 specific question labels. But I debugged and checked the lengths of “word_dic” and “answer_dic” in the code. The length of “word_dic” is 89 and the length of “answer_dic” is 28 of the ”CLEVR_train_questions.json” after preprocessing. I wonder if it has any connection with what you said "200 specific labels (answer), but more than 100,000 specific labels" ? I am very confused about this question and would like to ask you this question. Finally, can I add a contact information like “wechat” in China to ask you for advice? I know this may be a bit presumptuous. Looking forward to your reply, thank you for your help! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

13yyy commented 11 months ago

Hello: I am very happy to receive your reply email and thank you very much for your time and answer. In the previous letter, I did not introduce it clearly. Recently, I have been conducting research work related to multimodal semantic communication. However, there are very few related opensource codes. At present, I have only found the code you reproduced, which gave me lot of help. However, when I reproduced your work, the result of train_acc was about 50% and the result of val_acc was about 0%. But the acc of the original paper even reached about 90%, so I still have a few questions to ask you here.

Q1, in your work, what do the 200labels and 100000labels mentioned in the readme.md specifically refer to? Why does a too large gap affect training? What specific impacts do you remember? I would like to ask if you have any modification ideas to improve this train_acc and val_acc?

Q2, the readme.md says that the train_acc is only about 50%. I would like to ask if you still remember what the result of valid set is? When I try, the train_acc is almost 50%, but the val_acc is only about 0.08%. Did this happen when you implement the code? Regarding the situation where val_acc is 0% when running test.py, I would like to ask you where may be the reason? Do you have any suggestions for improvements?

Q3, the readme.md said that the final mac network was done using TensorFlow, not Pytorch. I would like to ask you how much impact this has on the results acc and loss? Or is this not the reason of low acc? If I want to get better results, should I change the code in this place? In other words, the implementation using TensorFlow must be changed to using Pytorch?

Attached are several results record pictures that I reproduced. dataset:1000 training set photos and 300 validation set photos were selected. Finally, do you have any new ideas when reproducing this code? I would be grateful if you would share. And I'm sorry to spend your time. Thank you very much for replying to my email despite your busy schedule. Thank you for your great help to me. Looking forward to your reply, thank you for your help! Qiu Communication University of China 图片3 屏幕截图 2023-12-06 091001 屏幕截图 2023-12-06 091027 图片1 图片2

dimlight13 commented 11 months ago

Q1, in your work, what do the 200labels and 100000labels mentioned in the readme.md specifically refer to? Why does a too large gap affect training? What specific impacts do you remember? I would like to ask if you have any modification ideas to improve this train_acc and val_acc? A1. I would expect val_acc to go down because I think it's essentially an effect of overfitting. However, based on the results in test_model.ipynb, both val_acc and train_acc are near 50%. So there are 3 possibilities.

  1. data preprocessing issue
  2. a problem in the mac network (a problem in the PyTorch -> tensorflow conversion)
  3. a problem with channel model

Q2, the readme.md says that the train_acc is only about 50%. I would like to ask if you still remember what the result of valid set is? When I try, the train_acc is almost 50%, but the val_acc is only about 0.08%. Did this happen when you implement the code? Regarding the situation where val_acc is 0% when running test.py, I would like to ask you where may be the reason? Do you have any suggestions for improvements?

A2. I seem to recall that both training_acc and val_acc are close to 50%, have you checked your experimental results in test_model.ipynb? Your code results don't seem to reproduce well, so it's hard to say for sure where the problem is without seeing how you did it.

Q3, the readme.md said that the final mac network was done using TensorFlow, not Pytorch. I would like to ask you how much impact this has on the results acc and loss? Or is this not the reason of low acc? If I want to get better results, should I change the code in this place? In other words, the implementation using TensorFlow must be changed to using Pytorch? A3. I started my implementation with TensorFlow, so I had no choice but to implement it with TensorFlow. However, for a perfect performance reproduction, I recommend using Mac Networks based on PyTorch, which is open source. Please use my code only as a reference for simple multimodal processing.

A4. I don't see your picture in the first place. And for code reproduction, I highly recommend open source based on PyTorch.

Thank you.

-----Original Message----- From: @.> To: @.>; Cc: @.>; @.>; Sent: 2023-12-06 (수) 01:26:15 (GMT+09:00) Subject: Re: [dimlight13/MU_SC_for_VQA] Problems encountered during reproduction (Issue #1)

Hello:         I am very happy to receive your reply email and thank you very much for your time and answer. In the previous letter, I did not introduce it clearly. Recently, I have been conducting research work related to multimodal semantic communication. However, there are very few related opensource codes. At present, I have only found the code you reproduced, which gave me lot of help. However, when I reproduced your work, the result of train_acc was about 50% and the result of val_acc was about 0%. But the acc of the original paper even reached about 90%, so I still have a few questions to ask you here.

Q1, in your work, what do the 200labels and 100000labels mentioned in the readme.md specifically refer to? Why does a too large gap affect training? What specific impacts do you remember? I would like to ask if you have any modification ideas to improve this train_acc and val_acc?

Q2, the readme.md says that the train_acc is only about 50%. I would like to ask if you still remember what the result of valid set is? When I try, the train_acc is almost 50%, but the val_acc is only about 0.08%. Did this happen when you implement the code? Regarding the situation where val_acc is 0% when running test.py, I would like to ask you where may be the reason? Do you have any suggestions for improvements?

Q3, the readme.md said that the final mac network was done using TensorFlow, not Pytorch. I would like to ask you how much impact this has on the results acc and loss? Or is this not the reason of low acc? If I want to get better results, should I change the code in this place? In other words, the implementation using TensorFlow must be changed to using Pytorch?

Belows are several results record pictures that I reproduced. dataset:1000 training set photos and 300 validation set photos were selected.

Finally, do you have any new ideas when reproducing this code? I would be grateful if you would share.

And I'm sorry to spend your time. Thank you very much for replying to my email despite your busy schedule. Thank you for your great help to me.

Looking forward to your reply, thank you for your help!                                                                                                               Qiu                Communication University of China

邱颖敏 @.***

 

------------------ 原始邮件 ------------------ 发件人: "dimlight13/MU_SC_for_VQA" @.>; 发送时间: 2023年12月5日(星期二) 晚上7:09 @.>; 抄送: "Kunkun @.**@.>; 主题: Re: [dimlight13/MU_SC_for_VQA] Problems encountered during reproduction (Issue #1)

Hello. I can gladly give you the answer for some of your questions.

Q1. you whether this is the work of your team A1. No, I do not. The paper I implemented is from IEEE communication letters, which is not open access, therefore sending it to you directly would be problematic. Instead, because it's a paper on ARXIV with the same material, it's fine to look at it there. The URL is provided below. https://arxiv.org/abs/2108.07357

Q2. I would like to ask if the paper "Multi-User Semantic Communication for Visual Question Answering" and the paper "Task-oriented multi-user semantic communications for VQA" you mentioned are the same paper? A2. Perhaps I rushed through writing the README and made an error somewhere along the line. Thank you for informing me. To answer your query, it is the same paper.

Q3. The length of “word_dic” is 89 and the length of “answer_dic” is 28 of the ”CLEVR_train_questions.json” after preprocessing. I wonder if it has any connection with what you said "200 specific labels (answer), but more than 100,000 specific labels" ? I am very confused about this question and would like to ask you this question. A3. I found that certain responses were severely biased while debugging, so I documented it in the README.md. However, I've been working on other projects since then, so I haven't looked into it more, so there could be another reasons that I haven't discovered that are causing it to fail to learn. To answer your question, I don't believe it has anything to do with the word_dic and answer_dic lengths.

Q4. an I add a contact information like “wechat” in China A4. It's strage. I don't have wechat account. So, i can't provide you my account.

Please excuse me if my response is incomplete because I haven't researched this multi-modal topic for very long. I hope you find my responses useful.


Joonho Seon(Ph.D. Candidate) Communications and Artificial Intelligence Lab. Kwangwoon University, SEOUL, KOREA Tel.: +82-2-940-5567 Phone: +82-10-4576-2987 E-mail: @.***


-----Original Message----- From: @.> To: @.>; Cc: @.***>; Sent: 2023-12-05 (화) 18:18:50 (GMT+09:00) Subject: [dimlight13/MU_SC_for_VQA] Problems encountered during reproduction (Issue #1)

Hello, I am very interested in your work "https://github.com/dimlight13/MU_SC_for_VQA", but I can't find the paper "Multi-User Semantic Communication for Visual Question Answering" and the corresponding author of the paper"Zhiwei Xu, Yuxuan Song, Yongfeng Huang, and Shengyang Dai" you mentioned , I would like to ask you whether this is the work of your team. If so, I would like to ask you about the source of this paper. Could you please send me the paper"Multi-User Semantic Communication for Visual Question Answering"?

In addition, I saw a WeChat public account push “”. In this push, the code address of the paper "Task-oriented multi-user semantic communications for VQA"is your blog "https:// github.com/dimlight13/MU_SC_for_VQA", the network structure of the paper "Task-oriented multi-user semantic communications for VQA" is also consistent with the code you wrote, but your work "https://github.com/dimlight13/MU_SC_for_VQA" mentions that the code reproduced is the code of this paper "Multi-User Semantic Communication for Visual Question Answering". I would like to ask if the paper "Multi-User Semantic Communication for Visual Question Answering" and the paper "Task-oriented multi-user semantic communications for VQA" you mentioned are the same paper? Finally, I'm very concerned about a question. In your work ,you mentioned " I figured out the reason why the training wasn't performing properly. Data are unevenly distributed. Specifically, there are less than 200 specific labels ( answer), but more than 100,000 specific labels.This causes serious imbalance and training bias." I don't understand your meanings. My understanding is that there are few answer labels (only 200), but there are more than 100,000 specific question labels. But I debugged and checked the lengths of “word_dic” and “answer_dic” in the code. The length of “word_dic” is 89 and the length of “answer_dic” is 28 of the ”CLEVR_train_questions.json” after preprocessing. I wonder if it has any connection with what you said "200 specific labels (answer), but more than 100,000 specific labels" ? I am very confused about this question and would like to ask you this question. Finally, can I add a contact information like “wechat” in China to ask you for advice? I know this may be a bit presumptuous. Looking forward to your reply, thank you for your help! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

13yyy commented 10 months ago

Hello: I am very happy to receive your reply and thank you very much for your time. I also encountered some questions, and I would like to ask you here. Q1: When I use 70,000 images as the training set and 15,000 images as the verification set, the accuracy is about 50%, but the accuracy of running test.py (plus channel noise) under the default parameters is only about 13%; when I use 1000 pictures are used as training set and 300 pictures are used as verification set. The accuracy is about 40%, but the accuracy of running test.py (adding channel noise) under default parameters is about 25%. What is the reason why the test effect of the model trained with more data is not as good as the model trained with 1,000 images? Q2: I would like to ask you how to solve the large label gap you mentioned in README.md. If I want to improve the accuracy, do you have any other suggestions for data processing? Thank you for your help! Qiu Communication University of China

dimlight13 commented 10 months ago

First, I want to let you know that my responses may be delayed due to other commitments.

Q1: When I use 70,000 images as the training set and 15,000 images as the verification set, the accuracy is about 50%, but the accuracy of running test.py (plus channel noise) under the default parameters is only about 13%; when I use 1000 pictures are used as training set and 300 pictures are used as verification set. The accuracy is about 40%, but the accuracy of running test.py (adding channel noise) under default parameters is about 25%. What is the reason why the test effect of the model trained with more data is not as good as the model trained with 1,000 images? A1: You said in your simuation results that you got good results with 1,000 images, but it's difficult for me to say for sure since I'm not sure if you always get good results. However, as I'm sure you're aware, one of the main reasons for poor performance in testing is overfitting. In particular, I've seen cases where a certain portion of the 70,000 images were biased, so even with more data, if the bias increases, it can lead to poor performance.

Q2: I would like to ask you how to solve the large label gap you mentioned in README.md. If I want to improve the accuracy, do you have any other suggestions for data processing? Thank you for your help! A2. I think it can be improved by pre-processing biased images and answers through augmentation and data pre-processing, or by adjusting class weights.


Joonho Seon(Ph.D. Candidate) Communications and Artificial Intelligence Lab. Kwangwoon University, SEOUL, KOREA Tel.: +82-2-940-5567 Phone: +82-10-4576-2987 E-mail: @.***


-----Original Message----- From: @.> To: @.>; Cc: @.>; @.>; Sent: 2023-12-18 (월) 10:59:14 (GMT+09:00) Subject: Re: [dimlight13/MU_SC_for_VQA] Problems encountered during reproduction (Issue #1)

Hello: I am very happy to receive your reply and thank you very much for your time. I also encountered some questions, and I would like to ask you here. Q1: When I use 70,000 images as the training set and 15,000 images as the verification set, the accuracy is about 50%, but the accuracy of running test.py (plus channel noise) under the default parameters is only about 13%; when I use 1000 pictures are used as training set and 300 pictures are used as verification set. The accuracy is about 40%, but the accuracy of running test.py (adding channel noise) under default parameters is about 25%. What is the reason why the test effect of the model trained with more data is not as good as the model trained with 1,000 images? Q2: I would like to ask you how to solve the large label gap you mentioned in README.md. If I want to improve the accuracy, do you have any other suggestions for data processing? Thank you for your help! Qiu Communication University of China — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>