ArneBinder / pie-datasets

Building scripts for Pytorch-IE datasets.
MIT License
1 stars 0 forks source link

`argmicro` converter for `TextDocumentWithLabeledSpansAndBinaryRelations` should follow literature #99

Open ArneBinder opened 8 months ago

ArneBinder commented 8 months ago

The conversion procedure, i.e. this code, should follow the description in the section "3. Dataset: Transformation" of the following paper: Andreas Peldszus and Manfred Stede. 2015. Joint prediction in MST-style discourse parsing for argumentation mining. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 938–948, Lisbon, Portugal. Association for Computational Linguistics.

In addition: The authors of End-to-end Argument Mining with Cross-corpora Multi-task Learning (Morio et al., TACL 2022) mention that they follow An Empirical Study of Span Representations in Argumentation Structure Parsing (Kuribayashi et al., ACL 2019) for how to handle the add relations. It looks like this happens within this code. We should understand what they do in this regard and follow that.

The final result should be described in the Document Converters section of the PIE dataset card.

ArneBinder commented 8 months ago

Notes: I found errors in https://huggingface.co/datasets/pie/argmicro#data-schema -> MultiRelation (wrong argument names and wrong argument types); compare with its definition

ArneBinder commented 8 months ago

Analysis by @idalr: argmicro_0_relations 1.pdf

From what I see now, I think the relations in TextDoc... are much more complex than the other versions, so some simplifications could possibly improve the model performance. But, I don't agree with the modification they did in Kura..et al and Morio et al because then it's no longer distinguishable between linked arguments and convergent arguments, but we don't want to model that with TextDoc, then the task would be easier. And I wonder if we need the 'joint' relation both ways because we could do something like we did with 'connect_first' in AAE2 - which means we simply translate any bidirectional relation to go one way (from the latter to the former one).