INL / BlackLab

Linguistic search for large annotated text corpora, based on Apache Lucene
http://inl.github.io/BlackLab/
Apache License 2.0
103 stars 53 forks source link

option to stream complete document #54

Open eduarddrenth opened 6 years ago

eduarddrenth commented 6 years ago

No buffering, just stream a whole document when allowed

Performance / memory wise it may be a good idea in general to reconsider the buffering mechanism in BlacklabServer.

jan-niestadt commented 6 years ago

I'm not sure what you mean by stream. It is already possible to request part of the XML (it will cut at the specified word boundary and make sure the result is still well-formed), but I guess you mean something else?

eduarddrenth commented 6 years ago

I mean to directly write to response.getoutputstream. BlackLabServer always first writes response data to a string (memory), then writes to the response outputstream.

jan-niestadt commented 6 years ago

Ok, I see. We used to do that, but that caused problems if an error occurred halfway through generating the response. That would have to be taken into account when deciding on a new approach.

eduarddrenth commented 6 years ago

Yes I know, this will not be easy. If it ain't broke and there are no complaints / figures that support such a change.... Perhaps if we isolate streaming to ../contents the impact is smaller. I think especially there streaming is a benefit.

eduarddrenth commented 5 years ago

ee 8, maybe 7 as well has very nice streaming support (running here: https://web2.fa.knaw.nl/standertwurdlist-ws/):

    private Response prcessRequest(.....) {
        StreamingOutput output = new StreamingOutput() {
            @Override
            public void write(OutputStream output) throws IOException, WebApplicationException {
                try {
                    marshaller.marshal(....., output);
                } catch (Exception ex) {
                    throw new WebApplicationException(fault(ex));
                }
            }
        };

        return Response.ok(output).cacheControl(CACHE_CONTROL).build();

    }