PILLUTLAAVINASH / google-enterprise-connector-manager

Automatically exported from code.google.com/p/google-enterprise-connector-manager
0 stars 0 forks source link

Change delimiter in DocPusher #147

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Considering GSA documentation and bug #231438, I suggest the delimiter for
repeating meta values in DocPusher class should be changed from ", " (comma
followed by space) to " " (single space).

As per GSA documentation at
http://code.google.com/apis/searchappliance/documentation/52/xml_reference.html#
request_meta_filter
"Use only space characters as separators for terms in meta tag content."

Additional reference: Support ticket #365842202

Original issue reported on code.google.com by lightbends on 30 Apr 2009 at 10:48

GoogleCodeExporter commented 8 years ago
Marty, can you vet this? It seems reasonable to me based on a quick scan of the 
ticket, bug, and documentation. 
On the other hand, comma is in the term break list. Mahesh, do you have any 
concrete reason to remove the 
comma, beyond it being unnecessary and not recommended?

Original comment by jl1615@gmail.com on 30 Apr 2009 at 9:59

GoogleCodeExporter commented 8 years ago
Seems reasonable to me.  I'm not sure what you mean by 'term break list'?

Original comment by mar...@google.com on 1 May 2009 at 12:01

GoogleCodeExporter commented 8 years ago
Oh, you made me look it up! I mean the metatag_restrict_substring_separator 
command line flag mentioned in 
bug #231438.

Original comment by jl1615@gmail.com on 1 May 2009 at 12:41

GoogleCodeExporter commented 8 years ago
Yes, it seems given the current restricted separator list the comma would be
innocuous.  It's not clear from the support ticket that the comma is causing any
problems since the bug and support discussion seem related to embedded '.' and 
'&'
characters.

I would second John's question - do we have any specific case of the comma 
causing a
support issue?

Original comment by mar...@google.com on 1 May 2009 at 1:17

GoogleCodeExporter commented 8 years ago
After further research on this issue, I conclude that using either the current
delimiter (", ") or the proposed new delimiter (" ") will cause issues.

Issue with current delimiter:
-----------------------------
Lets say "resources" field is a repeating attribute and this stores person 
names in
the form of FIRST_NAME, LAST_NAME

For doc1, "resources" field contains only a single value.
Muhammad, Sheikh

Corresponding meta tag field in feed XML - 
<meta name="resources" content="Muhammad, Sheikh"/>

For doc2, "resources" field contains two values.
Adam, Muhammad
Sheikh, Abdullah

Corresponding meta tag field in feed XML - 
<meta name="resources" content="Adam, Muhammad, Sheikh, Abdullah"/>

With legacy CMS search tools, if you perform search for {resources=Muhammad, 
Sheikh},
it returns only doc1.
But GSA returns both doc1 & doc2.

Moreover, comma is in the term break list of bug #231438.

Issue with proposed delimiter:
-----------------------------
Lets say "location" field is a repeating attribute and this stores location 
names.

For doc3, "location" field contains only a single value.
Virginia Washington

Corresponding meta tag field in feed XML - 
<meta name="location" content="Virginia Washington"/>

For doc4, "location" field contains two values.
West Virginia
Washington D.C.

Corresponding meta tag field in feed XML - 
<meta name="location" content="West Virginia Washington D.C."/>

With legacy CMS search tools, if you perform search for {location=Virginia
Washington}, it returns only doc3.
But GSA returns both doc3 & doc4.

My 2 Cents-
Consider the last example (doc4) above.
What if the connector generates feed XML in the following fashion for handling
repeating attribute values?

<meta name="location" content="West Virginia"/>
<meta name="location" content="Washington D.C."/>

Any caveats by following this approach? Well, I leave it to you guys.

Original comment by lightbends on 6 May 2009 at 10:08

GoogleCodeExporter commented 8 years ago
Rather than affect our current customers leaving this as is for now.

Original comment by mgron...@gmail.com on 6 May 2009 at 11:05