enonic / xp

Enonic XP
https://enonic.com
GNU General Public License v3.0
201 stars 34 forks source link

Make binary extractor max length configurable #7294

Closed runarmyklebust closed 4 years ago

runarmyklebust commented 5 years ago

The BinaryExtractorImpl is using a default value of 100.000 characters as max length of what is extracted from a binary document for indexing.

This value should be configurable as an option. The name and placement of the option should be discussed; the binary extractor is available from everywhere, but used in content (media) domain only by us, so maybe a specific option for content-domain is the correct one?

Also, the default value seems too short, 100 pages of text is about 500.000 characters, this seems like a more sensible default value.

anatol-sialitski commented 5 years ago

Hi @runarmyklebust and @GlennRicaud

I suggest to create a com.enonic.xp.extractor.cfg file with property extractor.body.size.limit which will be initialized as 500 000 by default and to implement BinaryExtractorConfig as OSGI component.

Any objections?

sigdestad commented 5 years ago

Maybe we could place this config inside a common cfg for CMS instead?

sigdestad commented 5 years ago

Exttractor is only a part of the CMS api

anatol-sialitski commented 5 years ago

@sigdestad @GlennRicaud What the configuration file is common for CMS?

If you mean the system.properties file, then the property will be named and had the following format xp.config.extractor.writeLimit. Please, correct me if necessary

GlennRicaud commented 5 years ago

File: com.enonic.xp.extractor.cfg The class should be ExtractorConfigImpl with an interface ExtractorConfig

To be discussed: Default value: 100 000?, 500 000?, ... Property name: binary.buffer.limit? binary.write.limit?

sigdestad commented 5 years ago

I was hoping we could group CMS config in a common file like com.enonic.xp.cms.cfg

No reason to have exttractor as its own thing? Or is it a separate Osgi bundle?

GlennRicaud commented 5 years ago

It is not a CMS thing. It is its own OSGI bundle in the core.

File: com.enonic.xp.extractor.cfg The class should be ExtractorConfigImpl with an interface ExtractorConfig Default value: 500 000 Property name: body.size.limit ?

sigdestad commented 5 years ago

Ok