Closed bnavard closed 2 months ago
Hello, we apologize for the inconvenience. When generating the images, we placed them all in one folder with different filenames. You can resolve this by putting all the images in a single folder with distinct filenames, or by modifying the code so that it saves the output to different paths.
Hello,
I realized that
Monkey/data_generation/amg.py
is using the basename for each image as the name for the SegmentAnythingModel output json file. For example the json file generated formonkey/data_generation/images/scienceqa/images/train/1/image.png
is namedimage.json
stored in themasks
folder.The source of the error is coming from line 228 in
data_generation/amg.py
However, the problem is that there are multiple
image.png
inimages/scienceqa/images/train/
, therefore theamg.py
script keeps overwriting theimage.json
in themasks
folder. As a consequence all the similar basenames gets processed incorrectly. In other word whatever scripts that is built onamg.py
is incorrect, e.gsam_blip.py
.I computed the number of similar basenames in the image folder. Out of 617052 images there are only 587077 unique basenames. Nearly 30k basenaes are similar, therefore their SAM json files are overwritten on top of each other. This is 4 percent of your data for which wrong ChatGPT long description is generated at the end of your pipeline only because of this logical bug.