PILLUTLAAVINASH / google-enterprise-connector-manager

Automatically exported from code.google.com/p/google-enterprise-connector-manager
0 stars 0 forks source link

Allow Connector to specify if it wants to use the googleconnector:// protocol as the document URL #155

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
On the GSA, any content that is associated with the googleconnector://
protocol will only have one option for authz support - the connector
AuthorizationManager will be used.  On the CM, any Document that does not
contain a google:searchurl property will be assigned a googleconnector:// URL.

There are times when the Connector would like to provide content and no
google:searchurl, however, have the authz process configured on the GSA
rather than fixed to go to the Connector.

To support this we need to provide some way for the Admin to specify that
they would like to use a non-googleconnector:// URL for the content fed
from the Connector so the Authz can be configured on the GSA.

============ From Related Bug on GSA:
Bug 1869014: connector authorization - Do not let feed type dictate
authorization

The reason for the existence of metadata-and-url feed type connector?
SharePoint connector. Since the authorization had to be performed using
head request or SAML Bridge, the choice as made.
The bad?
   1. It's an after thought.
   2. Because the crawling is done by the GSA, the same or similar
information has to be entered twice: Follow & Crawl URL; Do not follow URL;
crawler access; there are two different schedules (one for connector and
one for GSA).
The complexity adds confusion and hence deployment failure.
Scenario 2. SharePoint's new content feed mode. To support batch
authorization, we've added content feed as an option. However, due to the
uniformity of the URL dictated by connector manager, it's impossible to mix
content feed with ACL policy.
Scenario 3. File connector. Similar issue with mixing content feed type and
ACL policy.
The real problem is that the authorization is being dictated by feed type.

Allow non-googleconnector:// URL to be used with connectors and support
only content feed. This should allow connectors like SharePoint connector
to have best of both worlds: simple configuration avoiding duplicates;
flexibility authorization of either connector, ACL, head request, SAML.

Original issue reported on code.google.com by mgron...@gmail.com on 3 Jun 2009 at 7:24

GoogleCodeExporter commented 8 years ago

Original comment by jl1615@gmail.com on 18 Sep 2009 at 8:35

GoogleCodeExporter commented 8 years ago

Original comment by mar...@google.com on 4 Dec 2009 at 11:09

GoogleCodeExporter commented 8 years ago
Fixed at r2395.

TODO(Docs): Need to note this in the Developer's Guide since this new property 
can be 
used to control the URL that is fed to the Search Appliance for each Document.

Notes to Developers:
-------------------
This change adds a new reserved property that can be used with a Document to 
specify 
the feed type for the Document.  It has the side effect of also enabling the 
connector to use SpiConstants.PROPNAME_SEARCHURL to specify the URL that is 
associated with the Document when the feed is created.

This new property is optional and if it is not used, the behavior will default 
to the 
current behavior.  That is, if the SpiConstants.PROPNAME_SEARCHURL is present 
then 
the feed will be considered a 'web' feed and the given PROPNAME_SEARCHURL will 
be 
used as the Document URL in the feed.  Otherwise, the feed will be considered a 
'content' feed and the URL will be fabricated.

The new property is SpiConstants.PROPNAME_FEEDTYPE.  JavaDoc below:

  /**
   * Identifies a single-valued FeedType property that, if present, will be
   * used to determine the feed type for this document.  It is strongly
   * recommended that this property be set to explicitly determine the feed
   * type ('content' or 'web') for the document.
   * <p>
   * If this property is not set, the feed type will be determined as follows:
   * <ol>
   * <li> If there is no {@link #PROPNAME_SEARCHURL} then the feed type will
   *      default to 'content' feed using a fabricated URL derived from the
   *      {@link #PROPNAME_DOCID}.
   * <li> If there is a {@link #PROPNAME_SEARCHURL} then the feed type will
   *      default to 'web' feed and use the {@link #PROPNAME_SEARCHURL} as the
   *      document URL.
   * </ol> 
   * <p>
   * Value: google:feedtype
   */
  public static final String PROPNAME_FEEDTYPE = "google:feedtype";

Values are restricted to:

  /**
   * Enum for the list of possible feed types.
   */
  public enum FeedType {
    CONTENT, WEB
  }

If the SpiConstants.PROPNAME_FEEDTYPE property is set it will determine the 
feed type 
for the document.  It will also enable the Connector to specify the URL for the 
feed 
using the SpiConstants.PROPNAME_SEARCHURL property.  That is, if the FEEDTYPE 
is 
specified and the SEARCHURL is set, the specified SEARCHURL will be used as the 
URL 
for the Document.  If the FEEDTYPE is set and the SEARCHURL is not specified, 
even in 
the case of 'web' feeds, a URL will be fabricated for the Document using the 
Connector name and DocId.

For example, if you wanted to set the URL and send a 'content' feed (previously 
not 
possible) you would set the following properties on your Document:

  docProps.put(SpiConstants.PROPNAME_FEEDTYPE,
               SpiConstants.FeedType.CONTENT.name());
  docProps.put(SpiConstants.PROPNAME_SEARCHURL,
               "http://fqh.host:port/folder/doc.ext");

Log Message:
-----------
Objective is to provide a way for the Connector to specify a URL that does not
use the googleconnector:// protocol to be used as the document URL for
authorization.  There is a Design Proposal with some more background and
alternatives considered.

This fix provides three main changes to extend the existing functionality to
provide the desired feature.

1) A new SpiConstant property, PROPNAME_FEEDTYPE, was created to explicitly
   specify the feed type for the Document and then reuse the PROPNAME_SEARCHURL
   as the Connector specified URL for the document.  This allows the Connector
   to choose between the fabricated URL and a specified URL.

2) Since the new URL won't have a well defined way of embedding the Connector
   Name, the DTD of the <Resource> element within the <AuthorizationQuery> was
   extended to include an optional "connectorname" attribute.

3) Again, since the new URL won't have a well defined way of embedding the
   Connector Name, the DTD of the <Resource> element within the
   <AuthorizationResponse> was extended to include an optional "connectorname"
   attribute.

Original comment by mar...@google.com on 17 Dec 2009 at 7:00