Suggestion for clustering dataset (legislative texts)

Just stumbled upon this dataset: https://huggingface.co/datasets/dreamproit/bill_labels_us, which has lots US Congress bills labeled by policy area. I won't probably have the time to add this, but thought it could be a suggestion if folks are looking for inspiration (feel free to close if note relevant).

Not a new language, but looking at existing clustering datasets it seems like that'd be a quite new domain.

It could also be a classification task, but clustering seems more interesting (and there is no natural train/dev/test split).

embeddings-benchmark / mteb

Suggestion for clustering dataset (legislative texts) #744