Vision-CAIR / VisualGPT

VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models
MIT License
316 stars 49 forks source link

Trying to run code on IU X-ray database #4

Closed PurpleDish closed 3 years ago

PurpleDish commented 3 years ago

Hi, I've been interested in image captioning and specifically automatic medical report generation, and I stumbled across your VisualGPT which seemed to take a promising approach, and I've been trying to get it to work with other databases, specifically IU as mentioned in your article.

I can't figure out how you guys have set up the COCO database, and how I should be trying to structure IU X-ray to fit into your code. Is it still supposed to use COCO_detections.hdf5? Or am I supposed to create a hdf5 file for IU?

junchen14 commented 3 years ago

Hi, I've been interested in image captioning and specifically automatic medical report generation, and I stumbled across your VisualGPT which seemed to take a promising approach, and I've been trying to get it to work with other databases, specifically IU as mentioned in your article.

I can't figure out how you guys have set up the COCO database, and how I should be trying to structure IU X-ray to fit into your code. Is it still supposed to use COCO_detections.hdf5? Or am I supposed to create a hdf5 file for IU?

hi, thanks for interest of our code. for IU X-ray setup, we follow the similar experimental setting as the paper " Generating Radiology Reports via Memory-driven Transformer " in "https://www.aclweb.org/anthology/2020.emnlp-main.112.pdf", where we use ResNet101 pretrained on Imagenet to extract patch features, and feed a sequence of patch features into the visual encoder

PurpleDish commented 3 years ago

Hi, I've been interested in image captioning and specifically automatic medical report generation, and I stumbled across your VisualGPT which seemed to take a promising approach, and I've been trying to get it to work with other databases, specifically IU as mentioned in your article. I can't figure out how you guys have set up the COCO database, and how I should be trying to structure IU X-ray to fit into your code. Is it still supposed to use COCO_detections.hdf5? Or am I supposed to create a hdf5 file for IU?

hi, thanks for interest of our code. for IU X-ray setup, we follow the similar experimental setting as the paper " Generating Radiology Reports via Memory-driven Transformer " in "https://www.aclweb.org/anthology/2020.emnlp-main.112.pdf", where we use ResNet101 pretrained on Imagenet to extract patch features, and feed a sequence of patch features into the visual encoder

Sorry, I'm not sure I understand, so you use ResNet to extract some patch features, and that's the coco_detections.hdf5 file? Also, if you don't mind, how do you create the annotations folder/files

junchen14 commented 3 years ago

Hi, I've been interested in image captioning and specifically automatic medical report generation, and I stumbled across your VisualGPT which seemed to take a promising approach, and I've been trying to get it to work with other databases, specifically IU as mentioned in your article. I can't figure out how you guys have set up the COCO database, and how I should be trying to structure IU X-ray to fit into your code. Is it still supposed to use COCO_detections.hdf5? Or am I supposed to create a hdf5 file for IU?

hi, thanks for interest of our code. for IU X-ray setup, we follow the similar experimental setting as the paper " Generating Radiology Reports via Memory-driven Transformer " in "https://www.aclweb.org/anthology/2020.emnlp-main.112.pdf", where we use ResNet101 pretrained on Imagenet to extract patch features, and feed a sequence of patch features into the visual encoder

Sorry, I'm not sure I understand, so you use ResNet to extract some patch features, and that's the coco_detections.hdf5 file? Also, if you don't mind, how do you create the annotations folder/files

coco_detections.hdf5 is the precomputed bounding box features of the coco dataset by pretrained Fastter R-CNN. coco_detections.hdf5 is different from IU X-ray features.

you can refer to the paper "Generating Radiology Reports via Memory-driven Transformer " for the IU X-ray feature preparation in more details

PurpleDish commented 3 years ago

Thanks for the help, I'll have a look and see if I can figure it out!