Closed Gary-code closed 2 months ago
We follow the JSON format in LLaVA when conducting our experiments. The format is as follows:
{"image": [path1, path2, path3],
"conversations": ["from": "human", "value": "<image><image><image>\n text",
"from": "gpt", "value": "response"]},
If you use multiple images of different resolutions for input, then the number of the image placeholder (
Note that we pack the images into patches for our experiments, which is more efficient than loading image files one by one. We will release our packing method and data JSON in a few days.
Could you provide an example of a JSON template for fine-tuning multiple images?