Comprised of 227,488 texts totalling over 70 million lines and 1.4 billion tokens, the Corpus includes every in force statute and regulation in the Commonwealth, New South Wales, Queensland, Western Australia, South Australia, Tasmania and Norfolk Island, in addition to thousands of bills and hundreds of thousands of court and tribunal decisions.
This dataset does not have direct labels. We could construct some tasks based on the type or title for example, but generally this corpus is more valuable for pretraining.
Comprised of 227,488 texts totalling over 70 million lines and 1.4 billion tokens, the Corpus includes every in force statute and regulation in the Commonwealth, New South Wales, Queensland, Western Australia, South Australia, Tasmania and Norfolk Island, in addition to thousands of bills and hundreds of thousands of court and tribunal decisions.
Dataset: https://huggingface.co/datasets/umarbutler/open-australian-legal-corpus