Open kilin-s opened 6 years ago
Sorry you are having trouble. Can you describe what you mean by getting stuck? What is the behavior you observe before you restart it?
i see errors in prometheus-sql log
[metric_name] 2018/01/25 09:37:56 Post http://sql-agent.sql-agent.renv-0043.rancher-test.domain.ru:5000: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
[metric_name] 2018/01/25 09:37:56 Backing off for 5m0s
When i try send request by curl i don't receive any responce.
Have you tried curling against sql-agent directly? Can you confirm is it is not a DNS/port issue?
yes i tried curling directly and don't get responce, but port is open and i can connect to it by telnet. I think problem between sql-agent and postgres.
requests from prometheus-sql reach the sql-agent but not reach the postgres.
sometimes sql-agent start works without restart container.
And i forgot to say that postgres and prometheus-sql placed in one network, sql-agent in other.
Just to relay the request flow, prometheus-sql sends a request to sql-agent with db connection info and the query. sql-agent then attempts to open a connection to the database. So if sql-agent can't talk to postgres then it should still respond with an error in the response noting the problem, such as a connection issue. Here is the request handler. In case you only tried curling with GET, it only takes POST requests.
i see miscommunication here. I'm sorry, English is not my native language
i know that "prometheus-sql sends a request to sql-agent with db connection info and the query. sql-agent then attempts to open a connection to the database." and "only takes POST requests"
And again
1 when applications start, everything is fine, but after random time interval sql-agent dont send requests to the prometheus. In prometheus-sql logs i see
[metric_name] 2018/01/25 09:37:56 Post http://sql-agent.sql-agent.renv-0043.rancher-test.domain.ru:5000: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
[metric_name] 2018/01/25 09:37:56 Backing off for 5m0s
in sql-agent logs i see only that application is started
2 for restore working capacity i restart container with sql-agent, but somtimes sql-agent restores working capacity without restart the container.
3 when i curling i get responce instantly, when i curling in problem moment i gen nothing
* About to connect() to sql-agent.sql-agent.renv-0043.rancher-test.domain.ru port 5000 (#0)
* Trying 10.6.109.231... connected
* Connected to sql-agent.sql-agent.renv-0043.rancher-test.domain.ru (10.6.109.231) port 5000 (#0)
> POST / HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: sql-agent.sql-agent.renv-0043.rancher-test.domain.ru:5000
> Accept: */*
> Content-Length: 253
> Content-Type: application/x-www-form-urlencoded
>
^C
and no errors
4 in problem time with tcpdump i don't see packets from sql-agent to postgres. 5 postgres and prometheus-sql placed in one network sql-agent in other.
Can you create release for sql-agent, maybe when applications be in one network all will be work fine.
Hi.
I use sql-agent on docker with prometheus-sql. Periodicaly sql-agent stuck, only restart is helps. No errors in log
docker-compose.yml