elastic / connectors

Official Elastic connectors for third-party data sources
https://www.elastic.co/guide/en/elasticsearch/reference/master/es-connectors.html
Other
18 stars 136 forks source link

[Outlook] Feature: Office365 multi-user support #2844

Open ilyasabdellaoui opened 1 month ago

ilyasabdellaoui commented 1 month ago

Introduced BaseOffice365User abstract base class to standardize Office 365 user handling. Added MultiOffice365Users to manage multiple emails from config. Added client_emails (comma-separated) in OutlookDataSource config. Resolved issue with fetching too many users causing SMTP server not found error.

Closes https://github.com/elastic/connectors-py/issues/2843

Checklists

Pre-Review Checklist

cla-checker-service[bot] commented 1 month ago

💚 CLA has been signed

seanstory commented 1 month ago

buildkite test this

seanstory commented 1 month ago
Screenshot 2024-09-24 at 3 45 50 PM

The tests pass, but the linter is currently failing. You can fix this by running make autoformat when you're done making changes.

ilyasabdellaoui commented 1 month ago

in your PR description to use "Relates to" or "Part of", since another part of the issue you filed is that the current solution is limited to 999 users. I expect that's something we can better control with the graph API and another separate configuration, for folks who want "all users", and have more than 999.

@seanstory After rechecking the code, I realized I was mistaken in my initial description. The code isn't limited to fetching the top 999 users—it handles pagination using the @odata.nextLink property from the Microsoft Graph API, which retrieves all users. I missed that detail. I'll update my issue description to reflect the correct behavior.

https://github.com/elastic/connectors/blob/6e74c3a8023f3fd80de240e5ffb446321610225f/connectors/sources/outlook.py#L411-L429

seanstory commented 1 month ago

buildkite test this

ilyasabdellaoui commented 1 month ago

buildkite test this

Lint still fails due to async-related issues. Autoformat doesn't solve the lint errors. I'll investigate further after completing test functions. If unable to resolve, I'll seek assistance

artem-shelkovnikov commented 1 week ago

@ilyasabdellaoui do you want to make this PR ready for review, or is there anything we can help with?

ilyasabdellaoui commented 5 days ago

@artem-shelkovnikov Thanks for checking in! I encountered some linting issues, which I've now resolved in my latest commit, particularly related to async iteration. Everything is fixed and ready for review.

artem-shelkovnikov commented 5 days ago

buildkite test this

ilyasabdellaoui commented 2 days ago

Looks good! Left a comment about configurable field to make it consistent with other connectors, but otherwise great change.

Additional question - have you done any benchmarks on how faster the connector gets?

I know that you're saving a lot of time on network, but throttling-wise I believe this change makes not impact, is it true?

I haven't run any benchmarks yet, but I expect this change will improve performance by reducing unnecessary user fetches.

As for throttling, the client_emails configuration is most effective at avoiding it when the list of emails is small, by limiting the API calls to only the relevant users, it reduces the load on the API. For larger email lists, though, the impact on throttling might not be as important!

artem-shelkovnikov commented 2 days ago

buildkite test this

artem-shelkovnikov commented 2 days ago

Change looks good, but I'll not merge immediately, will address a bit later - we need to do same change in Kibana to make it properly work for native connectors. I'll follow-up and merge both at the same time