Closed runarmyklebust closed 4 years ago
Hi @runarmyklebust and @GlennRicaud
I suggest to create a com.enonic.xp.extractor.cfg
file with property extractor.body.size.limit
which will be initialized as 500 000 by default and to implement BinaryExtractorConfig
as OSGI component.
Any objections?
Maybe we could place this config inside a common cfg for CMS instead?
Exttractor is only a part of the CMS api
@sigdestad @GlennRicaud What the configuration file is common for CMS?
If you mean the system.properties
file, then the property will be named and had the following format xp.config.extractor.writeLimit
. Please, correct me if necessary
File: com.enonic.xp.extractor.cfg
The class should be ExtractorConfigImpl
with an interface ExtractorConfig
To be discussed:
Default value: 100 000?, 500 000?, ...
Property name: binary.buffer.limit? binary.write.limit?
I was hoping we could group CMS config in a common file like com.enonic.xp.cms.cfg
No reason to have exttractor as its own thing? Or is it a separate Osgi bundle?
It is not a CMS thing. It is its own OSGI bundle in the core.
File: com.enonic.xp.extractor.cfg
The class should be ExtractorConfigImpl
with an interface ExtractorConfig
Default value: 500 000
Property name: body.size.limit
?
Ok
The BinaryExtractorImpl is using a default value of 100.000 characters as max length of what is extracted from a binary document for indexing.
This value should be configurable as an option. The name and placement of the option should be discussed; the binary extractor is available from everywhere, but used in content (media) domain only by us, so maybe a specific option for content-domain is the correct one?
Also, the default value seems too short, 100 pages of text is about 500.000 characters, this seems like a more sensible default value.