JoelNiklaus / LawInstruct

This repository is a collection of legal instruction datasets
11 stars 3 forks source link

Dataset to be considered: TMID #19

Closed JulienGaumez closed 1 week ago

JulienGaumez commented 1 week ago

TMID is a novel dataset to detect trademark infringement in merchant registrations. This is a real-world dataset sourced directly from Alipay, one of the world's largest e-commerce and digital payment platforms. As infringement detection is a legal reasoning task requiring an understanding of the contexts and legal rules, we offer a thorough collection of legal rules and merchant and trademark-related contextual information with annotations from legal experts. We ensure the data quality by performing an extensive statistical analysis.

Dataset: https://github.com/emnlpTMID/emnlpTMID.github.io Paper: https://arxiv.org/abs/2312.05103

JoelNiklaus commented 1 week ago

This dataset is in Chinese which is linguistically quite far from our target languages (EU languages). For now we will disregard this.