danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://docs.danswer.dev/
Other
9.73k stars 1.08k forks source link

Sharepoint connector and pagination issue #1697

Closed bfamchon closed 5 days ago

bfamchon commented 6 days ago

Hello world !

I was trying to setup a Sharepoint connector using a specific site identifier.

This site is available to all people in the organisation and my App Registration from Azure have "Read all" rights.

Unfortunately, I keep seeing 0 documents indexed...

So I decided to look at the code base and see that chunk:

 sites = self.graph_client.sites.get().execute_query()
  self.site_data = [
      SiteData(url=None, folder=None, sites=sites, driveitems=[])
  ]

But there is a pagination problem because self.graph_client.sites.get().execute_query() only return 200 sites. I verify this point by logging sites. I see 200 sites returned but not mine ( there is more than 5000 sites available )

I found that Microsoft have some documentation about pagination here: https://github.com/microsoftgraph/msgraph-sdk-python?tab=readme-ov-file#32-pagination. I share it as a reminder and it may be a hint.

I'm not a python developer at all but I'll try to look at it.

What do you think about it ?

bfamchon commented 5 days ago

I was looking in my production version and this bug is resolved in the latest version. Thanks for your work 💪🏼