Closed SamuelCahyawijaya closed 6 months ago
Hi @holylovenia @SamuelCahyawijaya. This dataset only contains entity pairs and its relation w/o any passage information (hence can't be put on RELATION_EXTRACTION
task). Do you think we should omit the SEACrowd Schema Implementation or proceed to create another task for this? (likely will be using SEACrowd KB)
Hi @holylovenia @SamuelCahyawijaya. This dataset only contains entity pairs and its relation w/o any passage information (hence can't be put on
RELATION_EXTRACTION
task). Do you think we should omit the SEACrowd Schema Implementation or proceed to create another task for this? (likely will be using SEACrowd KB)
Will a task using the pairs
schema be suitable, @zwenyu @sabilmakbar?
@holylovenia In typical KB's format, it comes on a triplet of (Subject/Entity 1, Predicate/Relation, and Object/Entity 2). The difficulty of using pairs
schema is the possibilities of Predicate values can be either unknown -- there could be infinite choices of predicates -- or simply too many classes needede to be defined in Classlabel
name.
FYI this dataset has 935 possible relations values, which is tedious and seemingly impossible as well to write it all down w/o having to iterate all values on dataset
If we really want to make a SEACrowd Schema out of it (which I prefer not to because of the task itself is not particularly useful in NLP-related world) I suggest creating a triplet-based schema which is similar to pairs
schema, just having all of the columns as string (and possibly we can store both its values and its ID -- if the source dataset has it)
@holylovenia @sabilmakbar I've added a triplets
schema and updated the PR. Can you check if it's ok?
@holylovenia @sabilmakbar I've added a triplets schema and updated the PR. Can you check if it's ok?
I've checked it, and it looks okay (except the config PR should be separated from the dataloader one).
But one thing that worries me is whether we really need this SEACrowd triplet schema for a KB dataset that is less used in LLM/LMM development and evaluation.
Hello @sabilmakbar @zwenyu, apparently this dataloader utilizes a niche schema, so I think implementing only the source
schema is enough. No need for the seacrowd
schema.
@holylovenia @sabilmakbar Noted. I've reverted the changes and removed the seacrowd
schema.
Dataloader name:
indowiki/indowiki.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?indowiki