Open beasteers opened 5 years ago
Is there anything else in this PR that needs high-level commentary before i dig in for a proper CR?
I don't think so? Let me know if things need clarification
I divided it into separate commits after getting low-key shamed during the marl meeting. 😝
(Justin has told me that I should squash commits when contributing. my bad)
One thing I want to add is a static
parameter which will return a single value for the entire annotation. This would be useful to extract the background source_file for example
It'd also be good to be able to gather arbitrary sandbox data as well. I'm not sure if this fits in the scope of this transformer or if it'd be better to create a simpler, dedicated transformer for that purpose.
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.71%. Comparing base (
4a67bdf
) to head (c588f58
). Report is 5 commits behind head on main.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
What does this implement/fix? Explain your changes.
This adds a general purpose transformer that can be used to load and transform arbitrary observations with Pump.
I built this for the purpose of extracting Scaper annotations, but it's not Scaper specific.
Here's a super simple example:
Assuming a 5 second sample with a single 1 second event 2 seconds in, the data dict would look like this:
Filtering Observations
query
can be used flexibly with a wide range of values. The type of query should roughly match the observation values and is run recursively through dicts and lists. So if the observation is a dict and you want to query based on keys, you build it as a dict with keys matchingvalue
. Ifvalue
is a list and you want to condition element-wise, then makequery
a list. Ifvalue
is a single string then use a string. You can also use a set to check membership for hashable types.At any point, you can set it as a callable and it will pass the data up to that point.
It will will fail if any conditions are False.
Here are some valid query examples:
Aggregating interval windows
And you can have a bit more control.
reduce(x)
is iteratively fed a list of all the events within each hop window interval.real life
And finally, here's how I'm currently using it:
Any other comments?
As of right now, the
all_time_stretch
field won't work with a slicer because allNone
fields are interpreted as a time dimension. I see how this makes sense for thestructure
transformer. I'm not sure how to reconcile it with returning array values. Maybe it's really not necessary ever, but part of my thinks it would be a nice option to have (returning an array for each interval) if we want to support as many use cases as possible.This could also probably use some more safeguards preventing ppl from doing bad things, but atm I'm not sure what those would be so for now, I think it's okay to leave things open ended.