Closed andrescrz closed 1 week ago
My concern only concerns are:
- What if we have too many experiments?
- Shouldn't the response format be responsibility of the API layer ?
Those are nice questions. Let me answer then:
Regarding 1: there are multiple alternatives to handle this:
With all this in mind, I honestly think that the best option is to leave the endpoint like it is. Experiment names are randomly generated and in practice the number of matches within a workspace is going to be very low. We might never encounter an issue with this and if we do, we can always tackle 2, 3 or both.
Regarding 2, I've refactored the services and resources a bit and encapsulated the streaming logic in a Streamer
class. I think the separation of responsibilities and reusability of code has improved a lot.
With this, adding new streaming endpoints is much more straight forward. Let me know what you think.
- By using the current limit param: this wouldn't work in some cases, as we want to limit by the items.
Perfect. My concern was more with having a very large list of experimentIds
passed to the second. But I agree this may not be an issue any time soon
- By using the current limit param: this wouldn't work in some cases, as we want to limit by the items.
Perfect. My concern was more with having a very large list of
experimentIds
passed to the second. But I agree this may not be an issue any time soon
Yeah, I didn't explain myself well in my previous message, but that's exactly what I meant. We would also need to limit the number of retrieved experiments (and their IDs), in addition to the current limit for items.
But it's unlikely to be an issue for a very long period of time, so I propose to not to do it for now.
Agree! An option would be to do the join or a subquery in the IN clause. But can discuss about it later, when if needed
Details
This is meant to be used by SDK and to return at least these fields per experiment item:
id
,trace_id
anddataset_item_id
and that's exactly what it does, plus any other field inexperiments
table.It has the same semantics as the other stream operations in the service: limit (default 500) and last retrieved id cursor.
The search by experiment name follows the same pattern as in the find experiment endpoints: contains regex and case insensitive.
Therefore, it can match experiment items per multiple experiments. For that reason, they're sorted by experiment id first.
Issues
OPIK-76
Testing
Documentation
N/A