HITSZ-HLT / CMGCN

[ACL 2022] The source code of Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network
32 stars 3 forks source link

When will the code be made public? #1

Open usefang opened 2 years ago

usefang commented 2 years ago

I have some small problems with the reproduction of the paper, and hope to learn from your code. thank you.

BinLiang-NLP commented 2 years ago

I have some small problems with the reproduction of the paper, and hope to learn from your code. thank you.

Hi, The source code of this work will be released shortly afterwards. Please wait a few more days. Thanks!

BinLiang-NLP commented 2 years ago

Could you please provide open source code?thank you

Hi, The source code of this work will be released shortly afterwards. Please wait a few more days. Thanks!

less-and-less-bugs commented 2 years ago

Cool work and the code can be found on the ACL formal site in fact.

MeinhardMark commented 2 years ago

Found code on ACL official site. Haven't download and read it yet. Great work though! https://aclanthology.org/2022.acl-long.124/

BinLiang-NLP commented 2 years ago

Cool work and the code can be found on the ACL formal site in fact.

Thank you very much for your reminder. The code on the ACL formal site is also available. Please let me know if there is any problem.

BinLiang-NLP commented 2 years ago

Found code on ACL official site. Haven't download and read it yet. Great work though! https://aclanthology.org/2022.acl-long.124/

Yes, the code on the ACL formal site is also available. Please let me know if there is any problem. Thanks!!!

BinLiang-NLP commented 2 years ago

I have some small problems with the reproduction of the paper, and hope to learn from your code. thank you.

Hi, According to others' answers, I found that the code on the ACL formal site is also available: https://aclanthology.org/2022.acl-long.124/ Please let me know if there is any problem. Thanks!!!

BinLiang-NLP commented 2 years ago

Could you please provide open source code?thank you

Hi, According to others' answers, I found that the code on the ACL formal site is also available: https://aclanthology.org/2022.acl-long.124/ Please let me know if there is any problem. Thanks!!!

Mascheranovic commented 2 years ago

In data_utiles.py, graph = value["graph"]+value["sentic graph"] , I find it difficult to find out how to get the value["sentic graph"](not in the generate cross modal graph yet), could you please tell me how to get it?

RUBBISHLIKE commented 2 years ago

Could you please provide the preprocessed images and other data?thank you

JunyaoHu commented 1 year ago

Dear author, Hello! I download your code on this website, but I think this code is not complete. For example, I finished Step 1-4 of readme.txt, but when I do Step 5, *.ipynb cannot run. Detailly, get_boxes.ipynb wants to input all dataset images and output boxes.pkl, and get_VITfeats.ipynb wants to input /home/.../box" and output vit_features.B32.finetuned.pkl. I don't know what is 'box', I think it is a directory, but I cannot figure out how to generate it.


update I maybe have solved the upper question, because I have made some process to make these directories.

os.chdir('....../ACL22-sarcasm-code/data/dataset_images_boxes')
import cv2
import pandas as pd
images = pd.read_pickle("../../bottom-up-attention/boxes.pkl") # this is from get_boxes.ipynb
from tqdm import tqdm
for image_name, boxes in tqdm(images.items()):
    image_path = image_name[:-4]
    if not os.path.exists(image_path):
        os.mkdir(image_path)
    for box in boxes:
        x1,y1,x2,y2 = [int(i) for i in box[0]]
        label = box[1]
        image_file = "../dataset_image/" + image_name
        img = cv2.imread(image_file)
        cropped_image = img[y1:y2, x1:x2]
        cv2.imwrite(f'{image_path}/{label}.jpg',cropped_image)

update Now my questions are:

  1. the same as @Mascheranovic, I don't know what is value["sentic graph"]
  2. about BertTokenizer and spacy, for example In method generate_graph(line) of generate_cross_modal_graph.py you use 2 methods to calculate graph1, tokens, flag, but maybe spacy will have some conflicts in my opinion
    
    line = "we must rebuild our     military ! ! ! we need more battleships !  # gopdebate meteorologists"

from pytorch_pretrained import BertTokenizer tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") print(tokenizer.tokenize(line)) loading BertTokenizer.from_pretrained bert-base-uncased

""" ['we', 'must', 'rebuild', 'our', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', '#', 'go', '##pd', '##eb', '##ate', 'meteor', '##ologists'] """

import spacy nlp = spacy.load('en_core_web_sm') line = line.lower().strip() document = nlp(line) spacy_token = [str(x) for x in document] print(spacy_token)

""" ['we', 'must', 'rebuild', 'our', ' ', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', ' ', '#', 'gopdebate', 'meteorologists'] """

` '    '`and `'meteorologists'` are not in `vocab.txt` then when I run `train.py` it will show `KeyError: ' '` or `KeyError: 'meteorologists'`
if it is a space I can use regex to fix it as follows:

line = "we must rebuild our military ! ! ! we need more battleships ! # gopdebate meteorologists" import spacy nlp = spacy.load('en_core_web_sm') line = line.lower().strip() line = keep_one_space(line) document = nlp(line) spacy_token = [str(x) for x in document] print(spacy_token) ['we', 'must', 'rebuild', 'our', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', '#', 'gopdebate', 'meteorologists']


but how to fix  `KeyError: 'meteorologists'`?
isabelightL commented 1 year ago

Dear author, Hello! I download your code on this website, but I think this code is not complete. For example, I finished Step 1-4 of readme.txt, but when I do Step 5, *.ipynb cannot run. Detailly, get_boxes.ipynb wants to input all dataset images and output boxes.pkl, and get_VITfeats.ipynb wants to input /home/.../box" and output vit_features.B32.finetuned.pkl. I don't know what is 'box', I think it is a directory, but I cannot figure out how to generate it.

update I maybe have solved the upper question, because I have made some process to make these directories.

os.chdir('....../ACL22-sarcasm-code/data/dataset_images_boxes')
import cv2
import pandas as pd
images = pd.read_pickle("../../bottom-up-attention/boxes.pkl") # this is from get_boxes.ipynb
from tqdm import tqdm
for image_name, boxes in tqdm(images.items()):
    image_path = image_name[:-4]
    if not os.path.exists(image_path):
        os.mkdir(image_path)
    for box in boxes:
        x1,y1,x2,y2 = [int(i) for i in box[0]]
        label = box[1]
        image_file = "../dataset_image/" + image_name
        img = cv2.imread(image_file)
        cropped_image = img[y1:y2, x1:x2]
        cv2.imwrite(f'{image_path}/{label}.jpg',cropped_image)

update Now my questions are:

  1. the same as @Mascheranovic, I don't know what is value["sentic graph"]
  2. about BertTokenizer and spacy, for example In method generate_graph(line) of generate_cross_modal_graph.py you use 2 methods to calculate graph1, tokens, flag, but maybe spacy will have some conflicts in my opinion
line = "we must rebuild our     military ! ! ! we need more battleships !  # gopdebate meteorologists"

from pytorch_pretrained import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
print(tokenizer.tokenize(line))
loading BertTokenizer.from_pretrained bert-base-uncased

"""
['we', 'must', 'rebuild', 'our', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', '#', 'go', '##pd', '##eb', '##ate', 'meteor', '##ologists']
"""

import spacy
nlp = spacy.load('en_core_web_sm')
line = line.lower().strip()
document = nlp(line)
spacy_token = [str(x) for x in document]
print(spacy_token)

"""
['we', 'must', 'rebuild', 'our', '    ', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', ' ', '#', 'gopdebate', 'meteorologists']
"""

' 'and 'meteorologists' are not in vocab.txt then when I run train.py it will show KeyError: ' ' or KeyError: 'meteorologists' if it is a space I can use regex to fix it as follows:

line = "we must rebuild our     military ! ! ! we need more battleships !  # gopdebate meteorologists"
import spacy
nlp = spacy.load('en_core_web_sm')
line = line.lower().strip()
line = keep_one_space(line)
document = nlp(line)
spacy_token = [str(x) for x in document]
print(spacy_token)
['we', 'must', 'rebuild', 'our', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', '#', 'gopdebate', 'meteorologists']

but how to fix KeyError: 'meteorologists'?

Excuse me, have you successfully generated the intermediate preprocessed dataset files and reproduced the paper? During the reproduction process, I find that the intermediate files are missing, and I am confused about the format of the missing files, can you share the intermediate files if you successfully reproduce this model? Thank you!

isabelightL commented 1 year ago

During the reproduction process, I find that the intermediate files are missing, and I am confused about the format of the missing files, can you share the intermediate files if you successfully reproduce this model? Thank you!

JunyaoHu commented 1 year ago

During the reproduction process, I find that the intermediate files are missing, and I am confused about the format of the missing files, can you share the intermediate files if you successfully reproduce this model? Thank you!

Sorry, I gave up. And files are deleted because I run it on a online platform and didnt save them to local pc.

isabelightL commented 1 year ago

During the reproduction process, I find that the intermediate files are missing, and I am confused about the format of the missing files, can you share the intermediate files if you successfully reproduce this model? Thank you!

Sorry, I gave up. And files are deleted because I run it on a online platform and didnt save them to local pc.

Thanks for your useful reply, which ensure me not to waste time on it.

BinLiang-NLP commented 1 year ago

In data_utiles.py, graph = value["graph"]+value["sentic graph"] , I find it difficult to find out how to get the value["sentic graph"](not in the generate cross modal graph yet), could you please tell me how to get it?

I'm very sorry for this error. We will add the corresponding files as soon as possible. Thanks.

BinLiang-NLP commented 1 year ago

Could you please provide the preprocessed images and other data?thank you

Hi, what do you mean by "preprocessed images and other data"?

BinLiang-NLP commented 1 year ago

Dear author, Hello! I download your code on this website, but I think this code is not complete. For example, I finished Step 1-4 of readme.txt, but when I do Step 5, *.ipynb cannot run. Detailly, get_boxes.ipynb wants to input all dataset images and output boxes.pkl, and get_VITfeats.ipynb wants to input /home/.../box" and output vit_features.B32.finetuned.pkl. I don't know what is 'box', I think it is a directory, but I cannot figure out how to generate it.

update I maybe have solved the upper question, because I have made some process to make these directories.

os.chdir('....../ACL22-sarcasm-code/data/dataset_images_boxes')
import cv2
import pandas as pd
images = pd.read_pickle("../../bottom-up-attention/boxes.pkl") # this is from get_boxes.ipynb
from tqdm import tqdm
for image_name, boxes in tqdm(images.items()):
    image_path = image_name[:-4]
    if not os.path.exists(image_path):
        os.mkdir(image_path)
    for box in boxes:
        x1,y1,x2,y2 = [int(i) for i in box[0]]
        label = box[1]
        image_file = "../dataset_image/" + image_name
        img = cv2.imread(image_file)
        cropped_image = img[y1:y2, x1:x2]
        cv2.imwrite(f'{image_path}/{label}.jpg',cropped_image)

update Now my questions are:

  1. the same as @Mascheranovic, I don't know what is value["sentic graph"]
  2. about BertTokenizer and spacy, for example In method generate_graph(line) of generate_cross_modal_graph.py you use 2 methods to calculate graph1, tokens, flag, but maybe spacy will have some conflicts in my opinion
line = "we must rebuild our     military ! ! ! we need more battleships !  # gopdebate meteorologists"

from pytorch_pretrained import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
print(tokenizer.tokenize(line))
loading BertTokenizer.from_pretrained bert-base-uncased

"""
['we', 'must', 'rebuild', 'our', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', '#', 'go', '##pd', '##eb', '##ate', 'meteor', '##ologists']
"""

import spacy
nlp = spacy.load('en_core_web_sm')
line = line.lower().strip()
document = nlp(line)
spacy_token = [str(x) for x in document]
print(spacy_token)

"""
['we', 'must', 'rebuild', 'our', '    ', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', ' ', '#', 'gopdebate', 'meteorologists']
"""

' 'and 'meteorologists' are not in vocab.txt then when I run train.py it will show KeyError: ' ' or KeyError: 'meteorologists' if it is a space I can use regex to fix it as follows:

line = "we must rebuild our     military ! ! ! we need more battleships !  # gopdebate meteorologists"
import spacy
nlp = spacy.load('en_core_web_sm')
line = line.lower().strip()
line = keep_one_space(line)
document = nlp(line)
spacy_token = [str(x) for x in document]
print(spacy_token)
['we', 'must', 'rebuild', 'our', 'military', '!', '!', '!', 'we', 'need', 'more', 'battleships', '!', '#', 'gopdebate', 'meteorologists']

but how to fix KeyError: 'meteorologists'?

I am very sorry about missing some files in this version. I'll add them asap. Thanks!

wxy6 commented 1 year ago

Hi, I'm learning from your code. However, some files are missing, such as yml, prototxt and fast-rcnn files. Could you tell me where to get these files, or could you please ask the author to update them? Thanks!

EmilyCodeSailor commented 1 year ago

关于论文公式8 我在代码中找到的公式8的计算方法graph[i][cur] = wn.path_similarity(si,sj) + get_senticscore(si,sj) 但是在论文中是$\boldsymbol{\kappa}{i,j}=Sim(w_i,oj)\times\boldsymbol{\xi}{i,j}+1$,请问我应该按照哪一个公式去实施。

LUMTICS-JJ commented 11 months ago

关于论文公式8 我在代码中找到的公式8的计算方法graph[i][cur] = wn.path_similarity(si,sj) + get_senticscore(si,sj) 但是在论文中是$\boldsymbol{\kappa}{i,j}=Sim(w_i,oj)\times\boldsymbol{\xi}{i,j}+1$,请问我应该按照哪一个公式去实施。

你好,请问您是否完成了该项目中的第五步,我在复现该项目时遇到一些问题,感觉第五步中有一些文件该项目并未给出,请问您是否方便分享一下您的代码供作参考?

EmilyCodeSailor commented 11 months ago

关于论文公式8 我在代码中找到的公式8的计算方法graph[i][cur] = wn.path_similarity(si,sj) + get_senticscore(si,sj) 但是在论文中是$\boldsymbol{\kappa}{i,j}=Sim(w_i,oj)\times\boldsymbol{\xi}{i,j}+1$,请问我应该按照哪一个公式去实施。

你好,请问您是否完成了该项目中的第五步,我在复现该项目时遇到一些问题,感觉第五步中有一些文件该项目并未给出,请问您是否方便分享一下您的代码供作参考?

我尝试复现但项目中有些文件未给出,我并未成功生成这些文件,所以我放弃了。我仅仅是针对部分代码进行学习。

Czd6 commented 11 months ago

graph = value['graph'] + value["sentic_graph"] 请问这个sentic_graph是什么?有相关构建这个的代码吗?