collective / collective.solr

Solr search engine integration for Plone
https://pypi.org/project/collective.solr/
22 stars 46 forks source link

[Question] How to index custom NamedBlobFile? #240

Open NicolasGoeddel opened 4 years ago

NicolasGoeddel commented 4 years ago

Hi,

what do I have to do when I created my own content schema, derived from zope.supermodel.model.Schema, containing a plone.namedfile.field.NamedBlobField which I want to index? In this case the file will be a PDF file. Because the field name is simply pdf I added the line

<field name="pdf"        type="text"     indexed="true"  stored="true" />

to schema.xml and restarted Solr. I thought that's it and whenever I change the content of the field the PDF in that field will be indexed. But it seems I am wrong here and do not really understand the relationship between schema.xml and the content fields of Plone, do I?

Here is the simplified schema where I want the file in pdf to be globally searchable.

from plone.supermodel import model
from zope import schema
from plone.namedfile.field import NamedBlobFile

class IIndexedContent(model.Schema):
    title = schema.TextLine(
        title = u'Titel',
        required = True
    )
    pdf = NamedBlobFile(
        title = u'PDF',
        required = False,
    )

Thank you!

I am using Plone 5.2-rc2, Python 3.6, collection.solr 8.0.0a1

NicolasGoeddel commented 4 years ago

I found out collective.solr.solr.SolrConnection.add() creates the XML request for Solr and it tries to retrieve the pdf field, but it contains something like this:

<field name="pdf" update="set">&lt;plone.namedfile.file.NamedBlobFile object at 0x7f0c24891048 oid 0x1e33 in &lt;Connection at 7f0c27916b38&gt;&gt;</field>

How can I add a handler for these kind of file fields so it does the same thing as the BinaryAdder in indexer.py? I could hack something together which maybe works but I hope there already is such a functionality to register my own handlers or similar.

NicolasGoeddel commented 4 years ago

I now understand the thing with the DefaultAdder and its child classes like BinaryAdder. They are chosen based on the portal_type of a content object that should be indexed. I think it would be a nice idea to make something similar for field types. Or would it be possible to create some type of a decorator that can be used in a dexterity schema definition which automatically uses the right data extractor if there is a binary field or similar?

WhiteDiamondz commented 3 years ago

Hey @NicolasGoeddel ! I was having some trouble with exactly the same thing and came upon your opened issue. I was wondering what was the best solution you had found for this scenario. In indexer.py and following what you were saying I found the declared adapter for archetypes File like

     <adapter
      factory=".indexer.DXFileBinaryAdder"
      for="Products.Archetypes.interfaces.IBaseObject"
      name="File"
      />

To make sure that File would be indexed with the BinaryAdder so we won't have a field like <field name="pdf" update="set">&lt;plone.namedfile.file.NamedBlobFile object at 0x7f0c24891048 oid 0x1e33 in &lt;Connection at 7f0c27916b38&gt;&gt;</field> Which was something I had noticed as well while trying to create a specific text field in my Solr schema.xml

Is the best solution to declare an adapter in the configure.zcml file of collective solr ? In that case, since we want to link it to our add on what would be the best way to proceed ? I am currently trying something like the bellow code :

     <adapter
      factory=".indexer.DXFileBinaryAdder"
      for="My.AddOn.interfaces.ICustomContentType"
      name="CustomContentType"
      />
WhiteDiamondz commented 3 years ago

After declaring in configure.zcml the piece of code mentioned in my previous post, looks like I was able to get back the desired result ! Thanks for explaining and sharing your discoveries even if you hadn't gotten any answers, this helped a lot !