FasterXML / woodstox

The gold standard Stax XML API implementation. Now at Github.
Apache License 2.0
225 stars 81 forks source link

Why is the maximum character count for parsing limited to 4000 each time? Can't it support the caller to customize this setting? #206

Closed heyingquan0030 closed 5 months ago

heyingquan0030 commented 5 months ago

Why is the maximum character count for parsing limited to 4000 each time? Can't it support the caller to customize this setting?

https://github.com/FasterXML/woodstox/blob/fb357a2674c831c3c34a33e0e321937b4605838b/src/main/java/com/ctc/wstx/api/ReaderConfig.java#L571

cowtowncoder commented 5 months ago

You may be misunderstanding what that number means: it is just an internal read buffer size. It does not limit handling of longer string values (for example). The only visible effect of this limit is that unless coalescing mode is enabled (see f.ex https://stackoverflow.com/questions/66064967/what-is-the-property-is-coalescing-in-xmlinputfactory-for) character data content may be returned in multiple chunks.

So there is not really much benefit in exposing this internal setting to users. However, I guess we could add alternative constuctor for WstxInputFactory: currently there is

    public WstxInputFactory() {
        mConfig = ReaderConfig.createFullDefaults();
    }

but it seems we could add

    public WstxInputFactory(ReaderConfig cfg) {
        mConfig = cfg;
    }

(and similarly for WriterConfig/WstxOutputFactory)

if that is desireable? This could be added in 6.7.0 version (being API addition).

heyingquan0030 commented 5 months ago

You may be misunderstanding what that number means: it is just an internal read buffer size. It does not limit handling of longer string values (for example).

Thanks for the answer, I understand.