WinRb / Viewpoint

A Ruby client access library for Microsoft Exchange Web Services (EWS)
Apache License 2.0
248 stars 171 forks source link

Multithreading #153

Open pareeohnos opened 10 years ago

pareeohnos commented 10 years ago

Hi,

I'm running into an issue when attempting to multithread the processing of emails retrieved by Viewpoint. I'm not sure if I'm doing something wrong or using the library incorrectly, but I'd really like to have the processing in our application threaded as it is going to be processing a large number of emails.

In essence, I have the following code

exchange = Exchange::Interface.new
items = exchange.mailbox.items
... split items into arrays for threading ...
threads << Thread.new {
  thread_items.each do |i|
    processed = EmailWorker.new(i).start
  end
}

Here, the Exchange::Interface class is a simple wrapper around the Viewpoint library, and simply performs the same actions as shown in the documentation.

Inside the EmailWorker, I have something like

if email.has_attachments?
  attachments = email.attachments
  .... process attachments ....
end

This is where I'm running into issues now. When I call email.attachments I receive an Unauthorized request error.

I'm not really sure what's going on, I'm guessing it's the threading issue, but I'm not sure how to get around this. Is there any way I can sort this out easily, as I'd really rather not do the processing on a single thread. The inbox I'm working on currently has 16,500+ emails just for testing, so processing this all on a single thread will be very time consuming. We're using jRuby so I'd like to make the most of the native threads

zenchild commented 10 years ago

What version of the gem are you using?

pareeohnos commented 10 years ago

1.0.0 beta 2. Bundler is showing viewpoint (1.0.0.beta.2 07c7237) as I'm pointing it directly at this repo. I've got it working slightly differently, by instead passing the threads a list of item ID's and then retrieving them again in each thread, but it would be nice to reduce the number of requests given that I've already downloaded a list of items before spawning the threads

zenchild commented 10 years ago

If you're downloading the messages with folder.items you're still not getting the attachments because EWS does not support fetching AllProperties with the FindItems SOAP call. If you want to reduce your network calls try something like this:

# This dowloads just the IDs that we can pass to get_items
item_ids = client.find_items(folder_id: :inbox) {|builder|
  builder.opts[:item_shape][:base_shape] = "IdOnly"
}.collect(&:id)

# This will download the full message for the passed in IDs.
items = client.get_items(item_ids) do |builder|
  builder.opts[:item_shape][:base_shape] = "AllProperties"
end

items.each(&:mark_deep!)  # disallow further SOAP calls

This might also have the side-effect of fixing your threading issue because the error message you provided made it seem like it was failing in the connection code.

One word of caution with that many e-mails, you may want to look into pagination if you're downloading all of the attachments in one fell swoop or you may run into memory issues.

pareeohnos commented 10 years ago

Yeah I know get_items won't download the attachments at the same time but that's ok, I'm not expecting it to.

I've got pagination in mind, just trying to get things running prior to implementing that. Is it possible to do pagination with the get_items method? The flow of the application means that a full list of emails will be downloaded, then filtered based on a set of rules, and the remaining emails will then get processed.

I think with this flow I won't be able to download only the id's as your code above does, as I can't run those emails through the filters without knowing more information, which is why I was hoping to be able to download the emails once and simply re-use the email objects for the remaining processing

zenchild commented 10 years ago

The first block is the one that only downloads the IDs. The second block downloads the full message for all of the passed in IDs. What the code above will do is only make you have to call out to the network twice (once for the IDs, then once for the full messages) instead of once for each message. You will be getting everything you need in order to filter and the threading code won't have to make any network calls.

pareeohnos commented 10 years ago

Ah ok that's good then. I'll try getting it implemented in that way and report back

pareeohnos commented 10 years ago

Seems I still get the same error about being unauthorised when I attempt to read the contents of the attachment in order to save it as a file?