Closed Innixma closed 8 months ago
Support both makes sense to me given that tid
is a concept tied to OpenML (we may have dataset with unique names that do not have a tid
for instance some coming from kaggle).
Given that tid
only works for OpenML I think it would make more sense to have dataset
as the primary key, also not sure about having dataset == tid
given that one is a string and the other is an integer.
Updated to dataset
as primary key.
A general question that we might ask is if we want to use dataset or tid as the public API, or support both.
I think both can be used as the primary key, as repo will crash if there are duplicates (aka multiple tid map to same dataset, or multiple dataset map to same tid)
I’m leaning towards logic where tid is the primary key, and if dataset doesn’t exist, we just map them so dataset == tid , and tid could be str or int, so as not to tie us directly to OpenML.