freme-project / pipelines

Apache License 2.0
0 stars 0 forks source link

include processing time information in the output #14

Closed m1ci closed 8 years ago

m1ci commented 8 years ago

Just an idea for further development: It can be useful if the client get information about the time required for execution of the whole pipeline and processing time of each service. This will help to find the bottleneck in the processing.

ghsnd commented 8 years ago

Yes, this might be useful. Then it is just question how the response should look like. Now it just returns what the last service returns. So options are:

Or maybe some other. Anyway, good idea!

m1ci commented 8 years ago

IMO the first option is better from practical point of view. Wrap the response in metadata part and content of the "real" response.

ghsnd commented 8 years ago

This is implemented. If you add the request parameter stats=true, the response is wrapped.

Example:

curl -vX POST --header "Content-Type: application/json" --header "Accept: application/json" -d "@pipeline.json" "http://api-dev.freme-project.eu/current/pipelining/chain?stats=true"

Where @pipeline.json is a pipeline description (see the chain method for an example).

The result looks like:

{
    "metadata": {
        "serviceToDuration": {
            "http://api-dev.freme-project.eu/current/e-entity/dbpedia-spotlight/documents":1461,
            "http://api-dev.freme-project.eu/current/e-link/documents/":4299,
            "http://api-dev.freme-project.eu/current/e-translation/tilde":1963
        },
        "totalDuration":7723
    },
    "content": {
        "body": ...
        ...
        }
}

Here the metadata contains a mapping serviceToDuration which contains how long each separate service takes to process its request. totalDuration is the time to process the complete pipeline. Times are in milliseconds.

content contains the original pipeline response.

If this is OK, I will post an issue to add this in the API documentation.

ghsnd commented 8 years ago

This also works on templates, E.g.:

curl -vX POST --header "Content-Type: text/plain" --header "Accept: application/json" -d "Brussels is the capital of Belgium." "http://api-dev.freme-project.eu/current/pipelining/chain/37?stats=true"
jnehring commented 8 years ago

I like that this is displayed only via parameter "stats=true" so it is optional. How is it implemented? I think it could be done as a general filter so it is automatically enabled on every endpoint.

ghsnd commented 8 years ago

I like that this is displayed only via parameter "stats=true" so it is optional. How is it implemented?

Right now it is pretty hard-coded. The pipelines code returns a result with statistics anyway, and in the Broker the statistics are filtered out or kept depending on the value of the parameter. See the method pipeline() (it's not many lines of code ;) ).

I think it could be done as a general filter so it is automatically enabled on every endpoint.

Indeed. Then we also need a generic way to gather statistics...

jnehring commented 8 years ago

Well actually when I think about it the general filter does not make so much sense. The execution time of a request is only of interest in the pipelines. When a user is interested in the execution time of a single request he can measure it himself and does not need the stats parameter.

ghsnd commented 8 years ago

"serviceToDuration" in the response is now called "serviceDurations".

@m1ci If you are OK with how it looks, can you close the issue, or comment if not?

m1ci commented 8 years ago

nice work. Small remark on the naming - instead of "serviceDurations" I would call it "executionTime"

ghsnd commented 8 years ago

Done. Example:

{"metadata": {
    "executionTime": {
        "http://api.freme-project.eu/current/e-entity/dbpedia-spotlight/documents":1701,
        "http://api.freme-project.eu/current/e-translation/tilde":5284
    },
    "totalExecutionTime":6985 },
"content":{ ... }
}
m1ci commented 8 years ago

great, fine by me. If there is nothing more to discuss just go ahead and close the issue.