Closed jyono closed 1 year ago
we purposefully grouped other information under dataset and model name since those are the input parameters supplied by the client. we could unnest everything as you suggested, but that would mean you'd have to iterate over every single item in the list just to a) figure out which jobs are done and b) get metrics for a specific dataset / model / filter combination that you're interested in. the current configuration groups the data in such a way that you don't have to iterate over a large list on the client-side.
This is tricky because now we need to store the status array to know what is done. I propose we flatten the structure to make a single evaluation entry information complete. e.g.
not sure I understand why storing the statuses separately is a problem. it's meant to be convenient for the client: a user can quickly tell which jobs were DONE, etc. using the response object without having to iterate over all returned evaluations
we purposefully grouped other information under dataset and model name since those are the input parameters supplied by the client
sure. we could add filter and possibly job_id as query params as well.
but that would mean you'd have to iterate over every single item in the list just to a) figure out which jobs are done
right now, we have to iterate through the entire "DONE" status and then iterate through all the evaluations to find the matches we are looking for, if we don't know the job ID. This is already O(nk)
where n
is number of evaluations and k
is the number of DONE status items. Also, how do we differentiate jobs from each other? there is no connection between the evaluation entries(dataset, model, filter) and the status entries(job_id)
b) get metrics for a specific dataset / model / filter combination that you're interested in. the current configuration groups the data in such a way that you don't have to iterate over a large list on the client-side.
if you only want specific dataset/model combinations, the client would only request the ones they are interested in.
we also could make the evaluation a JSON object with keys something like dataset/model/filter
for faster lookup options.
e.g.
{
"evaluations":
"test_dataset/test_model/some_filter" {
"dataset": "test_dataset",
"model": "test_model",
"filter": "some_filter",
"job_id": 87,
"status": "DONE",
"metrics": {},
"confusion_matrices": {},
}
}
]
}
reiterating the most important question here: is there a specific use case where nested evaluations would be more difficult to work with than unnested evaluations? what pain point are we solving by unnesting statuses
?
sure. we could add filter and possibly job_id as query params as well.
is there a use case for saying "give me all metrics across all datasets and models for a specific filter criteria?" that doesn't seem intuitive to me, but would be interesting if true! as for job_id, we're hoping downstream clients don't have to interact much with Velour job IDs (i.e., it should probably be abstracted away as much as possible). if the user does have to do get evaluations for a specific ID, then an endpoint already exists for that.
right now, we have to iterate through the entire "DONE" status and then iterate through all the evaluations to find the matches we are looking for, if we don't know the job ID. This is already O(nk) where n is number of evaluations and k is the number of DONE status items.
can you please describe your use case here? the most important use case that I'm aware of that requires job_ids
is "help the client understand which jobs weren't finished when the bulk evaluations were fetched".
using the current structure, you'd just look at bulk_evaluations.statuses['NOT_DONE
] to get that list. under your proposed structure, you'd have to iterate through all evaluations to build a list of job_ids where status == NOT_DONE
. the first approach seems more convenient for that use case. what others am I missing?
Also, how do we differentiate jobs from each other? there is no connection between the evaluation entries(dataset, model, filter) and the status entries(job_id)
what's an example where you'd have to differentiate jobs from one another? in general, we want clients to worry about velour job_ids
as little as possible. the use case above is the only one I'm aware of where the client should care about job_ids
(e.g., if the client needs to re-query a specific job_id
because it wasn't finished when the bulk evaluations were fetched)
the current structure already establishes that relationship between dataset, model, and job_id in bulk_evaluations.statuses
. we could add filter
there as well, but I'm not sure why we need to. dataset
and model
were added because those are the two input parameters that are used in get_bulk_evaluations
.
if you only want specific dataset/model combinations, the client would only request the ones they are interested in.
not sure I understand what you mean here. get_bulk_evaluations()
gets all evaluations for any named datasets and models; it doesn't filter for the intersection between the user-supplied dataset
and model
we also could make the evaluation a JSON object with keys something like dataset/model/filter for faster lookup options.
two issues:
filter
filter
is both part of the key and present in the unnested list). not sure why this is advantageous, and it still means iterating through the entire list of evaluations to figure out which jobs haven't finished yetfixed in #236
Currently, the response looks like this.
What constitutes uniqueness for an evaluation? dataset+model+filter. The way this is currently structured, we are nesting the filters under a dataset+model pair. This is tricky because now we need to store the status array to know what is done. I propose we flatten the structure to make a single evaluation entry information complete. e.g.
@czaloom @ntlind thoughts?