JoelNiklaus / LawInstruct

This repository is a collection of legal instruction datasets
11 stars 3 forks source link

Dataset to be considered: CLERC #16

Open JulienGaumez opened 1 week ago

JulienGaumez commented 1 week ago

A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation. We work with legal professionals to transform a large open-source legal corpus into a dataset supporting two important backbone tasks: information retrieval (IR) and retrieval-augmented generation (RAG). This dataset CLERC (Case Law Evaluation Retrieval Corpus), is constructed for training and evaluating models on their ability to (1) find corresponding citations for a given piece of legal analysis and to (2) compile the text of these citations (as well as previous context) into a cogent analysis that supports a reasoning goal.

Dataset: https://huggingface.co/datasets/jhu-clsp/CLERC Paper: https://arxiv.org/abs/2406.17186