Open soxofaan opened 4 years ago
Our idea was that you don't have to care about, since the type is in fact detected when calling the function add()
, rather than the class with its method, like in your example.
Do you think the overhead is so high?
@claxn did you try any tests about this?
Do you think the overhead is so high?
since c0f0a5ac67b924884e067aa8f04477d59a5e4849 it's better I guess, it just feels like bit of a waste to do all that detection logic if you already know in advance that you want the numpy version for example
Yes, indeed. If you find it more useful to directly call the functions without argument type sniffing, which would be also less error-prone, please go for that. However, I think it is a nice feature to have and provides a better interface. What if we add a specific keyword (e.g., dtype="numpy"
) to the functions? This keyword is then handled by the decorator, i.e., if such an argument is given, we skip the argument type sniffing.
It's been a long time and none of us had much time for this package I guess, but I'm getting more interested in it since we will be using it more (particularly the xarray methods).
Recently I had some issues in correctly casting data to xarray and therefore I updated utils.py a few times (see #21).
I'm thinking more and more that this automatic data casting is not so essential, and it can lead to strange errors/results, plus it's a bit difficult to debug imho.
Although I was a supporter of the current implementation, I'm now leaning towards removing this wrapper altogether and let the user call the needed method, e.g. sum.exec_xar(data)
, which is way clearer and cannot lead to unexpected behavior.
What do you think? @soxofaan @claxn @clausmichele @sophieherrmann
What if we have to add a single integer to an xarray? Should we still call add.exec_xar(data, 1) ?
yes or equivalently sum.exec_xar([xr_1, xr_2, 1])
, depending on the input
I think that we should consider this looking into the processes implemented for the different data types. For example, some processes can't be available using numpy, since it lacks metadata support. On the other hand, only few are implemented using xarray, but it is not difficult to add new ones. If the future plan is to have all the processes working with xarray, I would rather wait and make it the default when the goal is reached.
Although I originally really liked the idea of the automatic detection of the correct function I agree with most of you that the overhead is getting to high. For what I know EODC and partly VITO (? @soxofaan ) are the only users of this package at the moment, and both are focusing on the xarray implementation.
Thanks to @ValentinaHutter by now a very large number of xarray processes are already implemented. So we could think about stripping away number and numpy implementations.
The only issue I currently see with only providing xarray functions are some corner cases which still use numpy implementations:
e.g. for PGs using fit_curve we currently use oeop.array_element(**{'data': parameters, 'index': 1})
with parameters=[1, 1, 1]
which is not a xarray and still uses the numpy implementation. It shouldn't be a problem to also use the xarray implementation in this case but it should definitely be tested explicitly.
For what I know EODC and partly VITO (? @soxofaan ) are the only users of this package
We use it indeed in the openeo-python-driver to basically to execute simple math/string/array operations that are not related to datacubes (the math there happens in Spark/Scala space). For example if you send this non-datacube process graph
{"add": {"process_id": "add", "arguments": {"x": 3, "y": 5}, "result": true}}
to the VITO backend it is executed by using openeo-processes-python functionality directly, very close to the http serving layer, without going into the depths of heavy duty DataCube handling (openeo-geopyspark-driver, openeo-geotrellis-extensions and alike)
So in that sense, xarray is even a bit of overkill for our use of the library
With the current openeo-processes-python API, every call (e.g.
add(5, 2.5)
) will involve type detection of the arguments ("num" twice in the example) in order to pick the correct implementation (Add.exec_num
). Additionally, there might also be some list to numpy.array conversion.Very often, an application using openeo-processes-python will known which calculation backend to use and all this detection/preprocessing stuff is unnecessary overhead.
What is or would be the best way of using openeo-processes-python when you know in advance what flavor (exec_num/exec_np/exec_xar/exec_dar/...) you want to use? Should one use
Add.exec_num(5, 2.5)
or are there more elegant solutions?