alcionai / corso

Free, Secure, and Open-Source Backup for Microsoft 365
https://corsobackup.io
Apache License 2.0
184 stars 38 forks source link

Delay fetching Exchange item data from Graph API until it's clear kopia will be uploading the item #2023

Open ashmrtn opened 1 year ago

ashmrtn commented 1 year ago

Kopia-assisted incrementals allows us to skip uploading items that kopia already has to kopia again. However, the logic for GraphConnector still fetches all item data. This data fetch eats into the number of requests that graph API allows us

Instead of always fetching item data, we should only fetch the data if kopia requests it

ashmrtn commented 1 year ago

also related to

ashmrtn commented 1 year ago

thinking about this more, it also has implications for how backup details for these items are handled. A plan for that should be pinned down prior to starting work on this

vkamra commented 1 year ago

Related also to #3007

vkamra commented 1 year ago

This (along with #3007) will help with scenarios such as:

ashmrtn commented 1 year ago

Now that we added support for sourcing details entries from assist bases, the requirements for enabling kopia assisted incrementals have become a bit easier to meet. Currently the requirements are:

Since we merge assist base details, we need only know the mod time and item name when first telling kopia about the item. We can put off materializing the ItemInfo until we know kopia read the file data and it wasn't cached

However, this leaves us with a couple new questions (numbered for ease of reference):

  1. how do we handle errors when trying to fetch item data?
  2. how do we handle items that are deleted between the time we enumerated them and the time we try to fetch their data?

For the first case, we can just return an error when opening the lazy reader. This will be caught by kopia and handled similar to an error fetching OneDrive/SharePoint data

For the second case, we need to be a little more careful (and this needs validated through manual testing). We should probably do three things:

This should be sufficient because (in theory, needs validation in practice) the next backup should get a delta result saying the item was deleted. That delta result will cause us to exclude the item in kopia. The details entry for the (now excluded) item shouldn't exist anyway so the item will disappear completely from the new backup

Although having to check for sentinel errors in the kopia code is a bit messy, it does allow us to avoid having details entries that don't have data to back them

The error check can also be leveraged down the road to make sure the details are properly initialized when we return them (i.e. return an error if details was requested but the item data was never read)

If we go this route, the below help setup the error return value from the Info call

ashmrtn commented 1 year ago

re-opening this since we want to disable the feature and get more test coverage on it

ashmrtn commented 11 months ago

un-assigning for the moment since there's additional work that needs to be done to get this enabled