elastic / connectors

Source code for all Elastic connectors, developed by the Search team at Elastic, and home of our Python connector development framework
https://www.elastic.co/guide/en/enterprise-search/master/index.html
Other
58 stars 116 forks source link

[Google Drive] Proper support of incremental syncs #2629

Open jedrazb opened 2 weeks ago

jedrazb commented 2 weeks ago

Problem Description

Right now, the "incrementa sync" of google drive falls back to the default naive incremental sync implementation where we have to at least fetch all document metadata and it only allows for skipping downloads of files that already exist.

Google drive incremental syncs do not use a "delta API" that would allow it to fetch only documents that changed from the last sync. E.g filtering documents at the source with e.g. q=lastiModifiedTime > syncCursor would result in much less file metadata to fetch and process during the incremental syncs and would likely result in much shorter incremental sync times.

Proposed Solution

Once we have "smart" implementation of incremental syncs we can expect a big speedup for incremental syncs for massive datasets.

Open questions

Additional Context

Acceptance criteria