Dynamoid / dynamoid

Ruby ORM for Amazon's DynamoDB.
MIT License
580 stars 195 forks source link

How to find items when you don't know if they exist or not? #427

Closed grahamotte closed 4 years ago

grahamotte commented 4 years ago

I'm trying to check if some items exist in my table, something like:

User.find([1, 2, 1000]) # user 1000 doesn't exist but 1 & 2 do

It looks like the find method forces this to raise an error even if some of the elements do exist.

https://github.com/Dynamoid/dynamoid/blob/d6ebcaeb45bd65a08d0a8c54c3df671cdbd8626b/lib/dynamoid/finders.rb#L50

Is there a way to find what does exist without raising?

andrykonchin commented 4 years ago

The only way to check existence of several items for now is using #where method or #exists? - thin wrapper around it e.g.

User.where(id: [1, 2, 3])

The only difference is that it will be not as efficient as find method call because DynamoDB Query Scan operation will be used instead of BatchGetItem.

Dynamoid follows the Rails convention here and Dynamoid::Document.find behaves like ActiveRecord::Base.find does.

It's quite easy to add similar to find method which just doesn't raise an exception or even simpler - to accept raise_error : false option in existing find method. But there is a small issue - DynamoDB's BatchGetItem doesn't preserve order of items in response so Dynamoid should take care about it.

Please let me know if any option works for you.

PS

In this case Scan operation will be used, not Query, as far as there is no hash_key=<id-value> condition.

grahamotte commented 4 years ago

I was having some issues with where responding with

User.where(id: [1, 2, 3]) 
# => #<Dynamoid::Criteria::Chain:0x00007fda129043b0 @query={:digest=>["3682282...

User.where(id: [1, 2, 3]).all.first
# => Aws::DynamoDB::Errors::ValidationException (One or more parameter values were invalid: Condition parameter type does not match schema type)

Perhaps some issue with the hash_key being used in the where?

As for exists?, I may have missed some documentation but it looks like we'd need to call that for every item in the list which could take quite a while when there are many items. More generally though, my understanding is that neither of these are a great approaches due the relative cost of Query.

The ideal interface, from my limited point of view, would be where but when the only key is the hash key Dynamoid uses BatchGetItem instead of Query.

andrykonchin commented 4 years ago

Just realized that there is already a proper method - find_all. It behaves exactly in the way you'd expect. It doesn't raise exceptions and uses efficient BatchGetItem operation.

Perhaps some issue with the hash_key being used in the where?

Can you provide the definition of the User model in order to reproduce the issue?

As for exists?, I may have missed some documentation but it looks like we'd need to call that for every item

Looks like the documentation methd published on rubydoc.info is outdated. It will be fixed. It can except the same arguments as find method. Just look at the example from source code:

Post.exist?([713, 210])
grahamotte commented 4 years ago

Great! Will find_all be deprecated though? https://github.com/Dynamoid/dynamoid/blob/d6ebcaeb45bd65a08d0a8c54c3df671cdbd8626b/lib/dynamoid/finders.rb#L69

As for exist? that method is undefined (v3.5.0), but exists? looks to only return true or false if all items exist or none do which isn't super useful in determining which items exist.

Here's my user class

class User
  table(name: 'users', key: :digest, inheritance_field: :unused, timestamps: false)

  field :id, :integer
  field :host, :string
  field :type, :string
end

It's a little odd since I'm converting an existing dynamo table to use dynamoid so we have id & type fields but id is not the hash_key, the hash key is a digest which can be regenerated from attributes anytime. (note: in my previous examples I changed digest to id in order to make my problem easier to understand but maybe it's just more confusing now!)

andrykonchin commented 4 years ago

Will find_all be deprecated though?

It will. But find will support behavior of find_all method - with new option raise_error: false. The PR is already prepared - https://github.com/Dynamoid/dynamoid/pull/429.

(note: in my previous examples I changed digest to id in order to make my problem easier to understand but maybe it's just more confusing now!)

Yeah, a little bit. I figured out that I proposed wrong DSL for where. Instead of

User.where(id: [1, 2, 3])

in predicate should be used:

User.where('id.in': [1, 2, 3])

or in our case it should be digest - User.where('digest.in': [1, 2, 3])

grahamotte commented 4 years ago

OK that makes sense. Also find_all does work exactly as expected for my use case. Thanks a lot for your help and quick replies!