SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Change the task composition of TydiQA #465

Closed holylovenia closed 6 months ago

holylovenia commented 7 months ago

The current TydiQA implementation adopts the primary and secondary subset composition:

_PRIMARY_DESP = """Passage selection task (SelectP): Given a list of the passages in the article, return either (a) the index of
              the passage that answers the question or (b) NULL if no such passage exists.
              Minimal answer span task (MinSpan): Given the full text of an article, return one of (a) the start and end
              byte indices of the minimal span that completely answers the question; (b) YES or NO if the question requires
              a yes/no answer and we can draw a conclusion from the passage; (c) NULL if it is not possible to produce a
              minimal answer for this question."""

_SECONDARY_DESP = """Gold passage task (GoldP): Given a passage that is guaranteed to contain the
          answer, predict the single contiguous span of characters that answers the question. This is more similar to
          existing reading comprehension datasets (as opposed to the information-seeking task outlined above).
          This task is constructed with two goals in mind: (1) more directly comparing with prior work and (2) providing
          a simplified way for researchers to use TyDi QA by providing compatibility with existing code for SQuAD 1.1,
          XQuAD, and MLQA. Toward these goals, the gold passage task differs from the primary task in several ways:
          only the gold answer passage is provided rather than the entire Wikipedia article;
          unanswerable questions have been discarded, similar to MLQA and XQuAD;
          we evaluate with the SQuAD 1.1 metrics like XQuAD; and
          Thai and Japanese are removed since the lack of whitespace breaks some tools.
          """

However, based on the discussion with @jen-santoso, it'd be better to modify it to adopt the 3-task subset composition: SelectP, MinSpan, and GoldP.

This issue was opened because I incorrectly merged the PR after this discussion. 😭 Would you be able to modify the dataloader to accommodate the new subset composition, @Gyyz?

Sorry @Gyyz and @jen-santoso, my bad. 🙏

Gyyz commented 7 months ago

The current TydiQA implementation adopts the primary and secondary subset composition:

_PRIMARY_DESP = """Passage selection task (SelectP): Given a list of the passages in the article, return either (a) the index of
              the passage that answers the question or (b) NULL if no such passage exists.
              Minimal answer span task (MinSpan): Given the full text of an article, return one of (a) the start and end
              byte indices of the minimal span that completely answers the question; (b) YES or NO if the question requires
              a yes/no answer and we can draw a conclusion from the passage; (c) NULL if it is not possible to produce a
              minimal answer for this question."""

_SECONDARY_DESP = """Gold passage task (GoldP): Given a passage that is guaranteed to contain the
          answer, predict the single contiguous span of characters that answers the question. This is more similar to
          existing reading comprehension datasets (as opposed to the information-seeking task outlined above).
          This task is constructed with two goals in mind: (1) more directly comparing with prior work and (2) providing
          a simplified way for researchers to use TyDi QA by providing compatibility with existing code for SQuAD 1.1,
          XQuAD, and MLQA. Toward these goals, the gold passage task differs from the primary task in several ways:
          only the gold answer passage is provided rather than the entire Wikipedia article;
          unanswerable questions have been discarded, similar to MLQA and XQuAD;
          we evaluate with the SQuAD 1.1 metrics like XQuAD; and
          Thai and Japanese are removed since the lack of whitespace breaks some tools.
          """

However, based on the discussion with @jen-santoso, it'd be better to modify it to adopt the 3-task subset composition: SelectP, MinSpan, and GoldP.

This issue was opened because I incorrectly merged the PR after this discussion. 😭 Would you be able to modify the dataloader to accommodate the new subset composition, @Gyyz?

Sorry @Gyyz and @jen-santoso, my bad. 🙏

Don't worry. I will try to modify and submit a PR later.

holylovenia commented 7 months ago

Don't worry. I will try to modify and submit a PR later.

Thank you so much, @Gyyz! You're the best. 👍