IDEMSInternational / open-app-builder

PLH App Frontend
GNU General Public License v3.0
5 stars 24 forks source link

fix: data pipe merge operator handles case of no input source #2342

Closed jfmcquade closed 1 week ago

jfmcquade commented 1 week ago

PR Checklist

Description

The data pipe merge fails and throws an unhandled error in the case that no input_source is provided. This PR handles that case by throwing a custom error in that case.

Also tidies some of the pre-existing data pipe operator tests (thanks, @chrismclarke).

2341 also suggests the existence of another possible issue to do with applying the merge operator to data lists that contain nested JSON data. I think it is sensible to handle this case as a follow-up if it does persist after applying the changes of this PR. In that case, a new debug sheet that exhibits the broken behaviour would be required.

Testing

See example_data_pipe_merge. Run a sync and ensure no errors are thrown, then inspect the outputs of the data pipe to confirm they are as expected.

Git Issues

Closes #2341

Screenshots/Videos

Attempting to run the data pipe as defined by example_data_pipe_merge in the following state:

Screenshot 2024-06-21 at 10 33 15

now logs the following error:

Screenshot 2024-06-25 at 12 51 02

Passing tests

Screenshot 2024-06-25 at 12 51 55
jfmcquade commented 1 week ago

Looks fine to me, although still have the issue in the back of my mind as to whether it actually makes sense to use the args list this way (given we have the input_source column), particularly given potential long-form args_list syntax in the future, e.g. merge_source: list_b; join_type: outer

https://chat.idems.international/all/pl/bji1h4pxbfbzdeiso9pzcx46re

Thanks, we discussed this on a call and decided that it would be better not to support the authoring style of providing a list of data sources in the args_list and no data source in the input_source, for the reasons you have given. I have reworked this PR, see the updated description.

We noted that the Data Pipelines RFC includes the proposal to support an explicit key/value syntax for the arguments passed to the args_list, which we approved of.

We also noted that it might make sense for cases like the merge operator to support passing a list of input sources to the input_source column (although not to the args_list column). And also that there may be good reasons to also support a list of outputs to the output_target column. I'm not sure where the best place to capture this is, potentially as part of that RFC, or I could write a new issue.

esmeetewinkel commented 1 week ago

https://github.com/IDEMSInternational/open-app-builder/issues/2341 also suggests the existence of another possible issue to do with applying the merge operator to data lists that contain nested JSON data. I think it is sensible to handle this case as a follow-up if it does persist after applying the changes of this PR. In that case, a new debug sheet that exhibits the broken behaviour would be required.

Debug sheet: example_data_pipe_merge_nested which merges a data list with a translatable column (which is nested as .json) image

with another data list. Syncing this gives the error: image

chrismclarke commented 1 week ago

I think this is good to merge - I'll address the specific issue raised by @esmeetewinkel in a follow-up PR