PipedreamHQ / pipedream

Connect APIs, remarkably fast. Free for developers.
https://pipedream.com
Other
8.32k stars 5.27k forks source link

Search Data Store key for particular value [ACTION] #5223

Closed NebularNerd closed 1 year ago

NebularNerd commented 1 year ago

This Data Store lacks a feature to check for the existence of a record within a key. I can do this inside Python but I wonder if a dedicated action would be faster/lighter.

Say for example we have a few values stored in a data store and I want to find a match (say 12345):

[
  67890,
  12345,
  7063578509386
]

I've tried every Data Store action relating to reading records but they dump back the whole JSON, Pipedream Filter is unable to search the records so cannot provide a match to stop execution with. so I guess an action such as
Search key for matching value
with a TRUE/FALSE return should do the trick.

Any help on this would be great, I have no issue with the Python method but this seems a fairly easy win for those just wanting to avoid coding.

vunguyenhung commented 1 year ago

Hi @NebularNerd, I think it makes sense to add this action. One thing is user need to input the data type of the value, so that Pipedream know how to check

NebularNerd commented 1 year ago

Hi @NebularNerd, I think it makes sense to add this action. One thing is user need to input the data type of the value, so that Pipedream know how to check

Yes this would make sense, as the values could be strings or raw array data.

andrewjschuang commented 1 year ago

Hi, @NebularNerd, the Filter action has simple array and object searching conditions, and the below worked for me:

image

image

NebularNerd commented 1 year ago

@andrewjschuang Awesome, that is (almost) exactly what I was looking for. I got overly focused on the idea of the Data Store being able to search for the key I did not look more into the filters full power. This should reduce my Python runtime drastically in the workflow.

@vunguyenhung @dylburger This provides me an immediate solution, shall I close this issue or would you like to leave it open until the new Data Store action is added?

EDIT: This sort of provides a solution, I can make it work for now as the filter will just end the workflow on match, however, it would be safer/better if a proper True/False output directly from the Datastore stage was given to use in later stages as per my original suggestion. I shall leave this open for discussion

andrewjschuang commented 1 year ago

+ @dannyroosevelt

dannyroosevelt commented 1 year ago

Interesting idea. So you'd search through the data store records based on the record's value instead of key, and return the key?

NebularNerd commented 1 year ago

Interesting idea. So you'd search through the data store records based on the record's value instead of key, and return the key?

Kind of, I was thinking more along the lines of a True/False return if the record exists, using my super lazy example you have a Data Store JSON with

{
    "subject_id": [
        67890,
        12345,
        7063578509386
    ]
}

You run a search for 12345 (in real life this would normally be a value in the trigger data or a previous step) and as it exists you get a return of TRUE, maybe an additional return with the JSON path to that record would be helpful as well for other purposes.

This is kind of similar to @andrewjschuang suggestion that is present in the filter, however, the filter works but it's only a stop/go switch (which works for me at present). This new action would allow for finer control for example creating a whitelist or similar, by checking if a specific record exists you could then have some sort of logic in the next step, so for Python you might have:

if [pd.step][search.for.record] = true:
    print ('We found the record')
    # Do stuff
else:
   print('Record not found')
   # Do other stuff

I have a very similar logic coded in Python using json.dumps() and other kludges to run a stop/go filter at the next step and merge a new record into the data store, but this would be a smoother and hopefully more lightweight solution.

Potential Issues: Nesting levels, duplicate records under different levels, string OR int and anything else JSON related. Duplicates are not a huge issue, Data Store path selection logic should avoid that.

vellames commented 1 year ago

That's feasible. Probably will not perform very well in a large DataStore, but feasible :)

NebularNerd commented 1 year ago

That's feasible. Probably will not perform very well in a large DataStore, but feasible :)

I imagine the performance would be similar to the Filter's ARRAY search speeds, from @andrewjschuang solution above. I agree with a huge database it may take some time which could be an issue with hitting Pipedream limits.

For my purposes I am running the check to see if the ID of a product has been logged before to prevent the rest of the script running, sensibly I could trim/delete the database every few months, it's mainly to stop the same ID being processed repeatedly as the trigger I use generates duplicate events (due to the way the source generates the trigger events, not an issue)

ghost commented 1 year ago

@vellames TESTFAILED

  1. can you please add a condition that it only accepts the valid numerical or character image 2: its not working if i entered half value of the original to find the key like original value is 12345 i entered 123 it returned nothing image

    3: for english word "Karina" i entered "karina" in small characters its not returning the results or it is intended to return the exact value (case sensitive)?

    image
NebularNerd commented 1 year ago

@Hassan-Abbas-SQA some interesting tests, there's some things I had actually not considered. Partial or exact matching is not something I considered based on my use case, the same for case sensitivity. Maybe these could be optional settings?

ghost commented 1 year ago

@Hassan-Abbas-SQA some interesting tests, there's some things I had actually not considered. Partial or exact matching is not something I considered based on my use case, the same for case sensitivity. Maybe these could be optional settings?

yes @vellames we can create an optional boolean prop for partial match or exact match to give the user complete power over search

Initially i did these tests so that we can modify the prop description to guide the user so he doesn’t get confused

vellames commented 1 year ago

I did it to return an exactly match, I can add these conditions

ghost commented 1 year ago

Thanks @vellames

vellames commented 1 year ago

@Hassan-Abbas-SQA can you check it please?

ghost commented 1 year ago

MY DATASTORE

image

1: it should have only returned the subject_id.1

image image

2: both contains and case-incentive should be optional fields and default to FALSE 3: both fields must work independently CASE_INSCENSITIVE description should be if true, the query will be used to filter the records based on case-Insensitivity, if false, the query will be used to filter the records that are equal to the query.

vellames commented 1 year ago

/approve

ghost commented 1 year ago

@vellames OOPS the action returned the key correct but the key starts with query itself which is making the key invalid

image image

other than that tested possible ways and it work really great @vellames please fix the only remaining issue then we are good to go 🚀

vellames commented 1 year ago

Oops, it should be fine now

vellames commented 1 year ago

/approve

ghost commented 1 year ago

ALL-TESTSPASSED

Great Stuff @vellames

image
NebularNerd commented 1 year ago

Sorry for not doing so earlier but thank you @vellames, @Hassan-Abbas-SQA and everyone else for making this happen. I've just found time to have a go of this and it works exactly how I envisioned it. 🙂

I think with some refactoring on my workflow I was looking to use it on, this will cut down on operation time by not having to go through so many filter steps. I can also test out my new Python-fu skills I've been teaching myself to further cut down excess work being done.