Alfresco / alfresco-indexer

A custom way to index Alfresco changes.
Apache License 2.0
16 stars 15 forks source link

Text Extracting #1

Open maoo opened 9 years ago

maoo commented 9 years ago

Hi,

ManifoldCF use extract update handler to handle binary content. Binary content is sent to solr, and tikka try to extract text content and some metadata (mime type).

For alfresco connector, Alfresco should be used to convert binary to text as official solr do (by calling NodeContentGet). Because alfresco already know how to convert document to text.

But NodeContentGet webscript is protected by Certificat, you have to clone this webscript.

(original issue - https://github.com/maoo/alfresco-webscript-manifold-connector/issues/21 by @alexist )

maoo commented 9 years ago

The Manifold Alfresco connector could invoke NodeContentGet (with http or https, both are available) during the manifold processDocument; this would imply:

alexist commented 9 years ago

But NodeContentGet is protected by solr-specific authentication mechanism (certificat). Is there another way to call this webscript in HTTP / without certificat ?

maoo commented 9 years ago

You can run without SSL - https://wiki.alfresco.com/wiki/Alfresco_And_SOLR#Running_Without_SSL

alexist commented 9 years ago

When SSL is disabled, Solr webscript are accessible without any authentication. Not sure it's good idea, and you need to protect another way these webscripts. Futhermore, you have to patch web.xml in order to disable SSL, also not a good idea.

I think exposing this webscript with the standard authentication mechanism can solve theses problem.

maoo commented 9 years ago

The all-in-one archetype is configured to use http (nossl) for Alfresco-Solr comms (in both directions)

https://artifacts.alfresco.com/nexus/content/repositories/alfresco-docs/alfresco-lifecycle-aggregator/latest/archetypes/alfresco-allinone-archetype/usage.html

alexist commented 9 years ago

the maven SDK disable SSL during development phase, not in production environment ...

maoo commented 9 years ago

True, but it shows how you need to patch the Alfresco web.xml in order to disable SSL