Open DaniBodor opened 4 months ago
I see the issue, but I don't think your solution is going to improve the situation much.
Currently, derived_from
is a list[Any]
. The items in the list are the actual objects that the newer object was derived from. The two ContinuousData
objects you see are not the same object. Rather, their string representation is the same, because the label is the same.
I wonder what the use of a 'label for the mutation' would be. Is that a randomly generated label? Is it a label given by the user? If so, are these helping or complication things for the end user? Or is it given by a function? Would it then not be better to just add the name/reference to the function that created the object, instead of a label?
E.g.:
[
(EITData(...), loading.load_eit_data),
(ContinuousData(...), EITData.calculate_global_impedance),
(ContinuousData(...), ContinuousData.select_by_time)
]
This theoretically should enable you to regenerate the object, if you would have the parameters that were passed to select_by_time
.
I think the real problem that the objects can't really be identified in their string representation. There should be other solutions for that.
ContinuousData(label="global_impedance_(raw)", t=[1020.2:5102.1])
) to distinguish between original and sliced objects;ContinuousData(4372398320, label="global_impedance_(raw)")
) emphasising that these are different objects.This begs the question: how useful is this going to be, and how much detail do you need? I think retracing steps is useful, but reconstruction of every step less so. I'd rather have a forward-plan (i.e. a description of the entire workflow from loading to exporting that can be applied to e.g. a different dataset) than a backward-plan (i.e. a way to retrace steps back to loading from a given result).
This begs the question: how useful is this going to be, and how much detail do you need? I think retracing steps is useful, but reconstruction of every step less so. I'd rather have a forward-plan (i.e. a description of the entire workflow from loading to exporting that can be applied to e.g. a different dataset) than a backward-plan (i.e. a way to retrace steps back to loading from a given result).
In my mind, these are ultimately the same thing. As you are doing each step one at a time, if you store exactly what function with which parameter was used, then you could export this and use it as a "forward-plan" on your next dataset. Obviously there may be redundant or unnecessary steps in there, but these can be pruned if needed. The advantage is that you don't need to decide on every step before creating your "forward-plan". You can play around with the data until it looks how you want it to look, and now from the final object you can see exactly what you did, even if you did not keep track of each mutation.
Ultimately, it would look something like this.
[
(EITData(...)", "", {}),
(ContinuousData(...), select_by_time, {"start_time": 1020.2, "end_time": 5102.1}),
(ContinuousData(...), apply_filter, {"type": LowPassFilter, "order": 10, "etc": ...}),
]
I realize that it's a lot of work to set this up, and might ultimately not be worthwhile or even possible for certain mutations. But if it does work, we basically have a setup to allow for automatic exporting of workflows, by just taking all steps minus the instances they were performed on.
Either way, I am not sure what purpose the current derived_from
attribute serves.
In my mind, these are ultimately the same thing. As you are doing each step one at a time, if you store exactly what function with which parameter was used, then you could export this and use it as a "forward-plan" on your next dataset. Obviously there may be redundant or unnecessary steps in there, but these can be pruned if needed. The advantage is that you don't need to decide on every step before creating your "forward-plan". You can play around with the data until it looks how you want it to look, and now from the final object you can see exactly what you did, even if you did not keep track of each mutation.
I think this would be useful for a GUI/interactive application. When working in code, the end result is always determined by the script you're using, which is (easily reconstructed into) a forward plan. But if you're working in the GUI, this could be useful indeed.
I realize that it's a lot of work to set this up, and might ultimately not be worthwhile or even possible for certain mutations. But if it does work, we basically have a setup to allow for automatic exporting of workflows, by just taking all steps minus the instances they were performed on.
As long as functions are deterministic (all (?) of the current ones are), they should be possible for all mutations. Maybe it becomes a more complex tree, though.
Either way, I am not sure what purpose the current
derived_from
attribute serves.
The current derived_from
serves the exact same purpose, but in a less detailed way. It is a way to trace back where your data came from. I you compare the two ContinuousData
objects (not just the string representation, but the actual objects) is it easy to find out that the second is shorter, and to conclude it must be a slice of the first. The original idea was that with the last item in derived_from
and the parameters
attribute, you should be able to reconstruct the current item. I don't know whether all functions currently properly set up parameters
, though.
The original idea was that with the last item in
derived_from
and theparameters
attribute, you should be able to reconstruct the current item. I don't know whether all functions currently properly set upparameters
, though.
I guess this boils down to the same thing as I am suggesting, except to place parameters
into the same object as derived_from
(although that's a detail).
It doesn't currently work though, in the notebook I linked filtered_gi.parameters
returns an empty dictionary.
Currently, the
derived_from
attribute only shows the trace-back of the previous objects used and printing it specifically only shows their types and labels. At this point, those labels are not changing in many cases, meaning that reading the trace-back is not that meaningful. To figure out what exactly happened in those steps, you'd have to dig into each object (and even then, I'm not positive it's always possible to determine). For example, I got this response after reproducing the commands from a presentation by @psomhorst see here.I think it would be good to make
derived_from
alist[tuple[str, str]]
, where the first string is the same as above, and the second some label for the mutation. Or maybe even alist[dict[str, Any]]
, where the dict holds the previous object, the mutation and all the parameters for that mutation.