Dateset to be considered: Open Australian Legal Corpus

JoelNiklaus / LawInstruct

This repository is a collection of legal instruction datasets

11 stars 3 forks source link

Dateset to be considered: Open Australian Legal Corpus #12

Closed JulienGaumez closed 1 week ago

JulienGaumez commented 1 week ago

Comprised of 227,488 texts totalling over 70 million lines and 1.4 billion tokens, the Corpus includes every in force statute and regulation in the Commonwealth, New South Wales, Queensland, Western Australia, South Australia, Tasmania and Norfolk Island, in addition to thousands of bills and hundreds of thousands of court and tribunal decisions.

Dataset: https://huggingface.co/datasets/umarbutler/open-australian-legal-corpus

JoelNiklaus commented 1 week ago

This dataset does not have direct labels. We could construct some tasks based on the type or title for example, but generally this corpus is more valuable for pretraining.