PILLUTLAAVINASH / google-enterprise-connector-manager

Automatically exported from code.google.com/p/google-enterprise-connector-manager
0 stars 0 forks source link

Sending documents with a search url causes exception for smb paths. #100

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Write a Connector that sends url feeds. (Which means documents with a
search url)
2. Take smb urls in the documents (something like smb://....)
3. Run the connector

What is the expected output? What do you see instead?

As the GSA supports smb urls, I thought the connector manager will do so,
too. But instead I got the following Exception:

09.06.2008 12:52:04 com.google.enterprise.connector.pusher.DocPusher
buildXmlData
WARNUNG: Supplied search url smb://(...) is malformed: unknown
protocol: smb 

What version of the product are you using? On what operating system?

Connnector Manager 1.0.3 Rev. 806
GSA version 5.0.0

Please provide any additional information below.

I research on my own and found this lines in the DocPusher:

518      try {
519        new URL(searchurl);
520      } catch (MalformedURLException e) {
521        LOGGER.warning("Supplied search url " + searchurl + " is
                                        malformed: " + e.getMessage());
522        return null;
523      } 

So the java.net.URL class raises the malformed exception. To test whether
there is any reason to do so I had comment out this lines and than have
tried it out again. It works perfectly. So I don't see the point for
resticting of url feeds. If there is any reason to forbit smb urls please
tell me.

Regards

Original issue reported on code.google.com by andree.j...@googlemail.com on 16 Jun 2008 at 8:04

GoogleCodeExporter commented 8 years ago
There's no reason not to support SMB URLs. We don't want to eliminate the URL 
validation in the connector 
manager, so there are a few implementation options:

1. Implement a mock URLStreamHandler that can parse SMB URLs. We don't need to 
implement the 
openConnection methods, because we're just using the parsing features of the 
URL class.

2. Separate HTTP and HTTPS URLs out, and only use the URL class to validate 
their syntax. Let SMB URLs pass 
through as-is. This isn't preferred, but we could go this route if there are 
unexpected issues with option 1 and 
time is short.

Original comment by jl1615@gmail.com on 18 Jun 2008 at 10:58

GoogleCodeExporter commented 8 years ago

Original comment by jl1615@gmail.com on 10 Jul 2008 at 9:54

GoogleCodeExporter commented 8 years ago

Original comment by Brett.Mi...@gmail.com on 19 Aug 2008 at 6:48

GoogleCodeExporter commented 8 years ago
Fixed in revision r929

This set of changes addresses Connector Manager Issue 100 -
Sending documents with a search url causes exception for smb paths.

The problem is that the out-of-the-box Java runtime does not support
'smb:' scheme URLs.  The URL framework does, however, allow the 
application to supply its own URLStreamHandler subclass to support
additial schemes.  So, I wrote an SMB URLStreamHandler that parses
smb: URLs.  Despite the 'StreamHandler' name in the class, this
SmbURLStreamHandler throws an unsupported operation IOException
if you call openConnection().  For our needs this is fine, since
the class is used to validate URL syntax, not to stream content.

In addition to the 'normal' URL parsing that happens, SmbURLStreamHandler
also validates smb: URLs according to the slightly more stringent 
constraints set out here:
http://code.google.com/apis/searchappliance/documentation/50/admin/URL_patterns.
html#SMB_patterns

Change Log:
----------
A 
projects/connector-manager/source/java/com/google/enterprise/connector/pusher/Sm
bURLStreamHandler.java
   - New subclass of URLStreamHandler that parses smb: scheme URLs.

M 
projects/connector-manager/source/java/com/google/enterprise/connector/pusher/Do
cPusher.java
   - Uses SMBURLStreamHandler when constructing URLs from smb: scheme searchurls.

M 
projects/connector-manager/source/javatests/com/google/enterprise/connector/push
er/DocPusherTest.java
   - Regression test pushes a record with smb: searchurl.

A  
projects/connector-manager/testdata/mocktestdata/MockRepositoryEventLog5smb.txt
   - Text input for above test.

Original comment by Brett.Mi...@gmail.com on 3 Sep 2008 at 8:21

GoogleCodeExporter commented 8 years ago

Original comment by jl1615@gmail.com on 12 Jan 2009 at 3:32