AnantLabs / google-enterprise-connector-sharepoint

Automatically exported from code.google.com/p/google-enterprise-connector-sharepoint
0 stars 0 forks source link

Provision to feed only latest versions of documents #125

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Enable version tracking for a SharePoint site
2. Create documents and modify them several times so that there are
multiple versions of the same document
3. Crawl this Site
4. All versions of the documents are fed to GSA.

What is the expected output? What do you see instead?
* Most of the times it makes sense to index only the latest version.
* It is  possible that multiple versions of same document have same
checksum values
* So if number of versions is high (>100) then the GSA indexes only 1st 100
documents.

* So user must be given a facility to configure whether all versions are to
be fed or only the latest versions must be fed to GSA.

Original issue reported on code.google.com by j.dars...@gmail.com on 2 Dec 2009 at 12:30

GoogleCodeExporter commented 9 years ago
Normally SharePoint document versions can be accessed using the URL of the form:
<Document-URL>&VersionId=<Version-Number>
And the latest version is accessible using:
<Document-URL>

So a workaround to index only latest versions is to add an exclusion pattern of 
the 
form "contains:VersionId="

But a more explicit configuration on the connector admin page is desirable.

Original comment by j.dars...@gmail.com on 2 Dec 2009 at 12:34

GoogleCodeExporter commented 9 years ago

Original comment by j.dars...@gmail.com on 2 Dec 2009 at 12:35

GoogleCodeExporter commented 9 years ago
This issue is filed as Google issue #6514002

Original comment by tdnguyen@google.com on 17 May 2012 at 11:54