Closed davidkyle closed 5 years ago
Pinging @elastic/ml-core
I tried to reproduce this failure locally, with no success.
FTR, these are the steps I've taken:
I've run this test multiple times: a) using -Dtests.iter:
./gradlew :x-pack:plugin:integtest -Dtests.class=org.elasticsearch.xpack.test.rest.XPackRestIT -Dtests.method="test {p0=ml/data_frame_analytics_memory_usage_estimation/*}" -Dtests.iters=10000
b) using bash for-loop (as I was unsure whether -Dtests.iters really does what I thought it should do):
for i in `seq 1 100`; do ./gradlew :x-pack:plugin:integtest -Dtests.class=org.elasticsearch.xpack.test.rest.XPackRestIT -Dtests.method="test {p0=ml/data_frame_analytics_memory_usage_estimation/*}" 1>/tmp/test_${i}.out 2>&1; done
I've started the server and flooded it with requests: a) start server locally:
./gradlew run -Dtests.es.xpack.license.self_generated.type=trial -Dtests.es.xpack.security.enabled=false -Dtests.heap.size=4g | tee /tmp/es_logs_1
b) populate index:
for i in `seq 1 1000`; do curl -X PUT -H Content-Type:application/json localhost:9200/my-index/_doc/${i} -d '{"a": 10, "b": 20 }'| json_pp; done
c) issue requests:
for i in `seq 1 10000`; do curl -X POST -H Content-Type:application/json localhost:9200/_ml/data_frame/analytics/_estimate_memory_usage -d '{ "source": { "index": ["my-index"] }, "analysis": { "outlier_detection" :{} } }' | json_pp --json_opt=canonical,pretty; done
100% of test runs and requests were successful, no errors in the logs.
Update:
I was able to reproduce the issue locally after adding Thread.sleep(10000);
right before process.isProcessAlive()
check.
This basically means that the process closes as soon as it produces output and it does not wait until its output is read by Java code. The reason why memory estimation is the only C++ process that behaves like this is because we do not provide an input pipe to it.
One possible solution would be to provide input pipe just like every other C++ process gets.
I'd like to explore other options though as memory estimation process does not need to receive any input from Java pipe so adding an input pipe only to control when the process stops feels overly complex.
I'll monitor build-stats console to verify that the failures did not reoccur: https://build-stats.elastic.co/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-6M,mode:quick,to:now))&_a=(columns:!(test.failed-testcases),index:b646ed00-7efc-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:'memory_usage_estimation_*'),sort:!(process.time-start,desc))
The failures did not reoccur since 30th of August which suggests the fix worked properly.
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob+fast+part2/866/console https://gradle-enterprise.elastic.co/s/2w3kg7bljpnlc
Does not reproduce: