collective / collective.solr

Solr search engine integration for Plone
https://pypi.org/project/collective.solr/
21 stars 46 forks source link

Is there a way to index File metadata without indexing the actual content? #289

Closed dkh7m closed 2 years ago

dkh7m commented 2 years ago

When I run @@solr-maintenance/reindex on my site, my HTML content indexes fine, but we can't get our PDF, DOC, etc. content ot come in b/c of blob storage permissions issues. Is it possible for collective.solr to just index ATFile metadata (title, description, etc.) but ignore the actual blob file? It's not imperative that we index the actual file content, but we need the metadata to be searchable. Thanks!

tisto commented 2 years ago

@dkh7m Solr indexes metadata of PDF and others by default. Solr just picks up those data. You need to install some system dependencies. I don't recall exactly but I think you need "wv" and/or "poppler-utils". This is not a c.solr issue but a Plone issue.