Closed SamuelCahyawijaya closed 3 months ago
Hi, the dataset is organized as follows:
dataset str: dataset name
version str: dataset version
split str: language ID
annotations List of image-question-answers triplets, each of which is
-- image_id str: image ID
-- image_url str: image URL
-- qa_pairs List of question-answer pairs, each of which is
---- question_id str: question ID
---- question str: raw question
---- answers List of str: ground-truth answers
---- processed_answers List of str: processed ground-truth answers. 16 tokenized answers.
---- is_collection bool: "true" if the question is of the "Collection" type; "false" otherwise..
In question answering schema, the features are:
id (str)
question_id (str)
document_id (str)
question (str)
type (str)
choices (list[str])
context (str)
answer (list[str])
meta (dict[Any])
is_collection
to type
, context
, or inside meta
?image_id
or image_url
for the document_id
?Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.
Hmm, I think I need to mention for faster response @sabilmakbar @holylovenia
I didn't realize I missed so many mentions from you. 😠Sorry!!
Could you please use Tasks.VISUAL_QUESTION_ANSWERING
? It employs the imqa
schema.
- Should I assign
is_collection
totype
,context
, or insidemeta
?
Inside meta
would be perfect. type
is typically open-ended
, multiple-choice
, extractive
, abstractive
, etc.
- Also, should I put
image_id
orimage_url
for thedocument_id
?
document_id
is related to the context
(if there is).
Dataloader name:
maxm/maxm.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?maxm