ZUCC-AI / UMIE

Code and model for AAAI 2024: UMIE: Unified Multimodal Information Extraction with Instruction Tuning
29 stars 2 forks source link

Data pre-processing for Multimedia Event Extraction dataset M2E2 #3

Closed MartinYuanNJU closed 8 months ago

MartinYuanNJU commented 9 months ago

Thanks for your great work! Could you please provide more detail about how to process the Multimedia Event Extraction dataset M2E2 download from M2E2 official website into the exact json file you use for testing MEE task in your work? The raw M2E2 dataset contains crossmedia_coref.txt, image_multimedia_event.json, text_multimedia_event.json, etc, I wonder how you pre-process these files and combine them into the final json file m2e2_test.json you use, it would be immensely helpful if you could provide a detailed step-by-step guide on converting the raw M2E2 dataset into m2e2_test.json. Thank you in advance for your response!

qingyuannk commented 9 months ago

Thanks for your great work! Could you please provide more detail about how to process the Multimedia Event Extraction dataset M2E2 download from M2E2 official website into the exact json file you use for testing MEE task in your work? The raw M2E2 dataset contains crossmedia_coref.txt, image_multimedia_event.json, text_multimedia_event.json, etc, I wonder how you pre-process these files and combine them into the final json file m2e2_test.json you use, it would be immensely helpful if you could provide a detailed step-by-step guide on converting the raw M2E2 dataset into m2e2_test.json. Thank you in advance for your response!

First, it is recommended to read this paper to understand the purpose of these files.https://blender.cs.illinois.edu/paper/multimediaspace2020.pdf Second, "Then, run the data processing code in this GitHub repository." https://github.com/limanling/m2e2/tree/master/src/dataflow/numpy Third, "Next, transform the data into the format that OneIE processes." https://github.com/jeremytanjianle/event-extraction-oneie Next, "Subsequently, execute the script for text data processing." https://github.com/ZUCC-AI/UMIE/blob/main/text_processing/run_data_generation.bash Finally, "Use image recognition tools to extract objects." https://github.com/airsplay/lxmert