Closed acul3 closed 1 month ago
The json
file is a list of dictionary, and each element of the list should be like:
{ 'id': '0010278167', # doesn't matter 'image': '0010278167.jpg', # path to the image 'conversations': [ {'from': 'human', 'value': '\<image>\n'}, {'from': 'gpt', 'value': 'Piece of dark jeans fabric Royalty Free Stock Photography'} # the text ] }
The
json
file is a list of dictionary, and each element of the list should be like:{ 'id': '0010278167', # doesn't matter 'image': '0010278167.jpg', # path to the image 'conversations': [ {'from': 'human', 'value': '
\n'}, {'from': 'gpt', 'value': 'Piece of dark jeans fabric Royalty Free Stock Photography'} # the text ] }
thanks for the reply
i see some sample on pre train data https://huggingface.co/datasets/BoyaWu10/Bunny-v1_0-data/blob/main/pretrain/bunny_pretrain_laion_2m.json
for example
{
"id":"0005128268",
"image":"0005128268.jpg",
"conversations":[
{
"from":"human",
"value":"<image>\nGive a short and clear explanation of the subsequent image."
},
{
"from":"gpt",
"value":"First Impressions And Suggested Talent Build Of Reworked Xul In Patch 26 3 Articles Tempo Storm From the recesses of the eastern jungles comes a man cloaked in mystery. tempo storm"
}
]
}
human
value has question like Give a short and clear explanation of the subsequent image.
i believe laion 2b subset doesnt provide this (CMIIW)
can you tell how to get this
The question would be deleted when training. code
So, just ignore it.
sure thank you for your confirmation
hello first of all,thanks for code!!
i am planning to train using different language of dataset
you said in the readme
can you share step how convert them to training format ?
i am planning to conver laion-2b-multi to training format too