allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
306 stars 40 forks source link

Switch from namedtuple to dataclasses #240

Closed bpiwowar closed 1 year ago

bpiwowar commented 1 year ago

Is your feature request related to a problem? Please describe. Libraries relying using ir_datasets might need to include extra information into the data objects (e.g. GenericDoc). NamedTuple prevents any inheritence.

Describe the solution you'd like Switch to dataclasses

Describe alternatives you've considered No real alternative apart from trying to use ugly Python constructions or wrappers (but those are not really nice).

seanmacavaney commented 1 year ago

Hey Benjamin! We've got the reasoning for namedtuple over dataclass detailed here: https://ir-datasets.com/design.

I understand that NamedTuples are not very conducive to inheritance, but I don't think inheritance adds very much in most cases, and it doesn't seem worth it above the considerations detailed in the link. Can you give some specific case where you'd find inheritance of the fields helpful?

bpiwowar commented 1 year ago

Hi, sorry I did not see this page; OK, this makes sense even though slotted dataclasses should be quite close performance-wise in recent python versions. Maybe to be reconsidered in the future?

seanmacavaney commented 1 year ago

I'm potentially open to it!

bpiwowar commented 1 year ago

BTW, in the potential use cases:

bpiwowar commented 1 year ago

Also, a good alternative would be attrs, see e.g. for a full comparison

https://towardsdatascience.com/battle-of-the-data-containers-which-python-typed-structure-is-the-best-6d28fde824e