Closed GoogleCodeExporter closed 9 years ago
Original comment by rakeshs101981@gmail.com
on 27 Jul 2009 at 6:48
Method: SharepointClient.updateGlobalState(final GlobalState globalState)
if(globalState.isBFullReCrawl() && null != spType) {
LOGGER.log(Level.INFO, "Discovering Extra webs");
discoverExtraWebs(allSites, spType);
}
Need to add flag to check that if it is the first crawl cycle and
globalState.isBFullReCrawl() is false, the discovery of site collections should
still
happen
One approach can be initiate this in
SharepointTraversalManager.startTraversal() if
the DocumentList returned by doTraversal() is empty (size==0).
Original comment by rakeshs101981@gmail.com
on 27 Jul 2009 at 7:01
Original comment by rakeshs101981@gmail.com
on 7 Aug 2009 at 2:19
Original comment by rakeshs101981@gmail.com
on 19 Aug 2009 at 4:31
The above code snippet can be changed as following to ensure the discovery of
extra webs.
if (doCrawl && null != spType)
The next thing required is the traversal of the discovered sites. Without that,
the
no. of documents that will be sent to CM will still be zero. And hence, the
problem
of checkpoint() not being called and a repetitive call to startRecrawl() will
still
persist. There has to be a trigger that will actually make the traversal
process to
fetch docs. The value of nDocuments in SharepointClient could be made use of.
But, it will still have one issue: If in between any batch traversals this
value is
0, then, the traversal process will initiate the discovery of new sites.
Owing to above reasons, there is another alternative:
- Keep the traversal logic in SharepointClient.updateGlobalState() as-is
- Check the size of SPDocumentList in startTraversal. If it is 0 and if the
SharePoint type is 2007 (WSS 3.0 or MOSS)
* Initiate the discovery of new sites,
* Update global state with the newly discovered sites
* Call SharepointClient.updateGlobalState() to initiate traversal of newly
discovered sites
This approach can be less error prone as the existing flow of execution is not
getting hampered directly. But, it addresses the issue of only one use case i.e
when
the crawl URL specified is fully empty.
Another approach could be of discovering the extra webs as soon as the traversal
cycle is completed and the no. of discovered documents is less then the batch
hint.
Currently, connector waits for the next traversal request to crawl the newly
discovered webs. This, will not only solve the current issue, but also speed up
the
connector's traversal.
following changes will be required:
boolean isGSupdated = updateGlobalState(globalState, allSites);
if (doCrawl && null != spType) {
if (!isGSupdated) {
discoverExtraWebs(allSites, spType);
isGSupdated = updateGlobalState(globalState, allSites);
}
if (isGSupdated) {
<initiate crawling of the newly discovered webs>
}
}
Original comment by th.nitendra
on 1 Sep 2009 at 2:12
Cases being handled here:
1. Batch hint # of documents have not been discovered, but there are
new sites which have been discovered. Crawl documents till you get
the batch hint # of docs
2. Batch hint # of documents have not been discovered and no new
sites have been discovered. In such cases get any new
personal/mysites, sites discoevered by GSS. Add them to the global
state and crawl them till batch hint # of documents is reached.
if (doCrawl && null != spType) {
// If the first check has passed, it might mean Case 1. If the
// following if block is skipped, it means this is Case 1, else it
// will be Case 2
if (!isGSupdated) {
// If this check passed, it means Case 2
discoverExtraWebs(allSites, spType);
isGSupdated = updateGlobalState(globalState, allSites);
}
// The following does not care if the sites are discoevered for Case
// 1 or Case 2. It will simply go ahead and crawl batch hint no. of
// docs from the new sites
if (isGSupdated) {
<initiate crawling of the newly discovered webs>
}
}
Original comment by rakeshs101981@gmail.com
on 2 Sep 2009 at 3:10
Fix details:
http://code.google.com/p/google-enterprise-connector-sharepoint/source/detail?r=
384
Original comment by rakeshs101981@gmail.com
on 5 Nov 2009 at 8:59
Verified in 2.4 Release
Original comment by ashwinip...@gmail.com
on 14 Dec 2009 at 6:36
Original issue reported on code.google.com by
rakeshs101981@gmail.com
on 27 Jul 2009 at 6:47