Open errai34 opened 1 year ago
We created two scripts, get_descriptions.py and get_conversations.py which use two models (text-davinci-003) and (gpt-3.5-turbo) to a) answer the question "what can you see in this image?" and b) generate 2 Qs and A from the Galaxy Zoo conversations.
1) We also need to "upgrade" the prompts to the right format.
2) Merge these two files into a master file that takes as input the different prompts and the model name.
Goal is to create_dataset.py, where we take the dataframe and the images from scrap_notes.py and create a Dataset consisting of:
[ { "id": "000000033471", "image": "000000033471.jpg", "conversations": [ { "from": "human", "value": "What are the colors of the bus in the image?\n"
},
{
"from": "gpt",
"value": "The bus in the image is white and red."
},
{
"from": "human",
"value": "What feature can be seen on the back of the bus?"
},
{
"from": "gpt",
"value": "The back of the bus features an advertisement."
},
{
"from": "human",
"value": "Is the bus driving down the street or pulled off to the side?"
},
{
"from": "gpt",
"value": "The bus is driving down the street, which is crowded with people and other vehicles."
}
]
}
]