Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
22 stars 12 forks source link

[Docs] Identify how to log all the requests sent to Elasticsearch #2138

Closed tschaffter closed 12 months ago

tschaffter commented 12 months ago

What product(s) is this documentation issue for?

OpenChallenges

Documentation issue

Description

I want to better understand the request that Hibernate Search sent to Elasticsearch.

To avoid caching on the client side, I should send test requests with a tool that does not cache requests. A suitable tool would be curl. Here is an example search request that target the index openchallenges-challenge-000001:

curl -XGET "localhost:9200/openchallenges-challenge-000001/_search?q=description:plop&pretty"

With attempting to tell the server to not cache:

curl -H "Cache-Control: no-cache, no-store" -XGET "localhost:9200/openchallenges-challenge-000001/_search?q=description:plop&pretty"

We can also tell ES to note cache the request.

curl -H "Cache-Control: no-cache, no-store" -XGET "localhost:9200/openchallenges-challenge-000001/_search?q=description:plop&pretty&request_cache=false"

There seems to be three ways to capture requests as described here.

Notes:

Is there a specific documentation page you are reporting?

No response

Anything else?

No response

Code of Conduct

tschaffter commented 12 months ago

How to see requests sent to Elasticsearch

Run the command shown below to start printing requests in the logs of the Elasticsearch cluster

curl -XPUT "http://localhost:9200/_all/_settings" -H "content-type: application/json" -d'
{
  "index.search.slowlog.threshold.query.warn": "0s",
  "index.search.slowlog.threshold.query.info": "0s",
  "index.search.slowlog.threshold.query.debug": "0s",
  "index.search.slowlog.threshold.query.trace": "0s",
  "index.search.slowlog.threshold.fetch.warn": "0s",
  "index.search.slowlog.threshold.fetch.info": "0s",
  "index.search.slowlog.threshold.fetch.debug": "0s",
  "index.search.slowlog.threshold.fetch.trace": "0s",
  "index.search.slowlog.level": "trace"
}'

Source: https://jolicode.com/blog/log-all-the-searches-going-through-elasticsearch

The above settings are for search queries. Settings for indexing queries is also possible.

Follow Elasticsearch logs:

docker logs -f openchallenges-elasticsearch

Warning While this is the commonly approved solution, docker logs still fails to show all the queries sent to ES.

Disabling ES caching for the index does not help:

curl -X PUT "localhost:9200/openchallenges-challenge-000001/_settings?pretty" -H 'Content-Type: application/json' -d'

{ "index.requests.cache.enable": false }
'

I also tried to invalidate the cache for all indices:

curl -X POST "localhost:9200/_all/_cache/clear?request=true&pretty"
tschaffter commented 12 months ago

Docker log file

I can find the log file from docker inspect. Its content does not log all the requests to ES either.

sudo tail -f /var/lib/docker/containers/c2aabd6b7fde7837c74be734df86a3b40a517ff778b9139b93b6d3c74a95828a/c2aabd6b7fde7837c74be734df86a3b40a517ff778b9139b93b6d3c74a95828a-json.log
tschaffter commented 12 months ago

Confirm that the ES cache is empty

curl -X GET "localhost:9200/_stats/request_cache?human&pretty"
tschaffter commented 12 months ago

Conclusion

The solution is to enable the HTTP tracer of ES for indices of interest with:

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{                 
   "transient" : {                                         
      "logger.org.elasticsearch.http.HttpTracer" : "TRACE",
      "http.tracer.include" : [ "*openchallenges-challenge-000001*" ]
   }
}
'

Follow the ES logs:

docker logs -f openchallenges-elasticsearch

The following requests is always logged:

curl -XGET "localhost:9200/openchallenges-challenge-000001/_search?q=description:plop&pretty"

But this request sent to the challenge service and obtained from Swagger API returns a successful result but does not appear in ES logs:

curl -X 'GET' \
  'http://localhost:8085/v1/challenges?searchTerms=dream' \
  -H 'accept: application/json'

The reason is because Hibernate Search send the query to a different ~index~ alias (see post below), namely openchallenges-challenge-read. Let's enable the HTTP tracer for this index too.

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{                 
   "transient" : {                                         
      "logger.org.elasticsearch.http.HttpTracer" : "TRACE",
      "http.tracer.include" : [ "*openchallenges-challenge-000001*", "*openchallenges-challenge-read*" ]
   }
}
'

Now the requests sent from the VS code extension humao.rest-client and Swagger UI are always logged by ES.

tschaffter commented 12 months ago

List indices

$ curl -X GET "localhost:9200/_cat/indices/*?v=true&s=index&pretty"
health status index                                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases                                9eUZUAUPRL2_j4kMsagLig   1   1         43            0     81.6mb         40.8mb
green  open   openchallenges-challenge-000001                 TmTrCX4eQ-6nJK9mfMHLGw   1   1        232            0    607.1kb        310.5kb
green  open   openchallenges-challenge-input-data-type-000001 9WvoCT86Qli7IgdoiTKMcQ   1   1          4            0       21kb          8.6kb
green  open   openchallenges-challenge-platform-000001        iHPJZqPhTImBSA7p5TtQKQ   1   1         16            0     26.9kb          9.3kb

List aliases

$ curl -X GET "localhost:9200/_cat/aliases?v=true&pretty"
alias                                          index                                           filter routing.index routing.search is_write_index
openchallenges-challenge-input-data-type-read  openchallenges-challenge-input-data-type-000001 -      -             -              false
openchallenges-challenge-input-data-type-write openchallenges-challenge-input-data-type-000001 -      -             -              true
openchallenges-challenge-read                  openchallenges-challenge-000001                 -      -             -              false
openchallenges-challenge-write                 openchallenges-challenge-000001                 -      -             -              true
openchallenges-challenge-platform-read         openchallenges-challenge-platform-000001        -      -             -              false
openchallenges-challenge-platform-write        openchallenges-challenge-platform-000001        -      -             -              true