Closed GoogleCodeExporter closed 9 years ago
Couple of follow up questions:
1) Do you see a successful completion of the insertAll requests? Internally
we'll time out any requests that take longer than a few seconds, so a 15-16
minute delay is quite unexpected. If you're seeing successful completion, it's
possible the delay is somewhere else in the system. I took a look at our
request logs over the last few days and don't see requests (success or failure)
that took in those time ranges.
2) Are you generating insertion IDs? This should atleast allow for a fast
retry? We will deduplicate in this case. You can then set a timeout on your end
and attempt fast retries.
Original comment by seanc...@google.com
on 3 Dec 2014 at 11:03
For some context, this is what our system sees for insertions to the above
mentioned table over the last two days. The times are in milliseconds:
+--------------+---------------+----------+-------+
| ResponseCode | average_time | max_time | count |
+--------------+---------------+----------+-------+
| 500 | 3550.0 | 5519 | 2 |
| 200 | 173.256051641 | 1601 | 3718 |
| 401 | 9.72727272727 | 36 | 22 |
+--------------+---------------+----------+-------+
Original comment by seanc...@google.com
on 3 Dec 2014 at 11:06
Hi,
Thanks for the reply.
1) Yes, I do see a successful completion of all the insertAll requests. I have
tested other dependencies and all of them look correct to me. I will observe
the network behavior while it happens to see if the rows are actually being
sent to BigQuery.
2) I am generating insertIds, but that doesn't help much in this scenario. I
tried timing out the queries after 30 seconds and retry them with the same
insertIds. The following requests also take that long (930-940 seconds) and
that makes me believe it might just duplicate respective rows.
The above problem had resurfaced around 3:00 pm PST (12-03) for the respective
dataset - table combinations:
amp_127890 : 127890_action_impression
amp_127890 : 127890_action_read
Thanks.
Original comment by nir...@amplitude.com
on 3 Dec 2014 at 11:15
Our logging near the edge continues to indicate that we're returning fairly
quickly (the pXXX_ms are the 50th, 95th, and 99.9th percentile latencies in ms,
over the last 2 days):
+-------------------------------------------------------------------------------
------------------------------------------+--------------+--------+--------+----
-----+-------+
| request
| ResponseCode | p50_ms | p95_ms |
p999_ms | count |
+-------------------------------------------------------------------------------
------------------------------------------+--------------+--------+--------+----
-----+-------+
| POST
/bigquery/v2/projects/513990544911/datasets/amp_127890/tables/127890_action_clic
k/insertAll?alt=json HTTP/1.1 | 200 | 120 | 168 | 708 |
1581 |
| POST
/bigquery/v2/projects/513990544911/datasets/amp_127890/tables/127890_action_clic
k/insertAll?alt=json HTTP/1.1 | 401 | 8 | 306 | 306 |
9 |
| POST
/bigquery/v2/projects/513990544911/datasets/amp_127890/tables/127890_action_impr
ession/insertAll?alt=json HTTP/1.1 | 200 | 148 | 248 | 1396 |
3032 |
| POST
/bigquery/v2/projects/513990544911/datasets/amp_127890/tables/127890_action_impr
ession/insertAll?alt=json HTTP/1.1 | 401 | 8 | 36 | 41 |
36 |
| POST
/bigquery/v2/projects/513990544911/datasets/amp_127890/tables/127890_action_impr
ession/insertAll?alt=json HTTP/1.1 | 500 | 1581 | 1581 | 1581 |
1 |
| POST
/bigquery/v2/projects/513990544911/datasets/amp_127890/tables/127890_action_read
/insertAll?alt=json HTTP/1.1 | 200 | 121 | 164 | 1571 |
2536 |
| POST
/bigquery/v2/projects/513990544911/datasets/amp_127890/tables/127890_action_read
/insertAll?alt=json HTTP/1.1 | 401 | 8 | 11 | 11 |
12 |
| POST
/bigquery/v2/projects/513990544911/datasets/amp_127890/tables/127890_action_read
/insertAll?alt=json HTTP/1.1 | 503 | 65 | 73 | 73 |
2 |
+-------------------------------------------------------------------------------
------------------------------------------+--------------+--------+--------+----
-----+-------+
I took a look at our logs from 2:30 - 3:10, and I do notice an interesting gap
of ~15 minutes (starts at 2:44:45PM, ends at 3:00:23PM) where we receive no
requests for any of the above targets. The times we log *should* be the first
point at which we see the requests.
Is your system single threaded, such that one long call will block all
subsequent calls? If not, and the problem is on our end, I wouldn't expect to
see a gap like this.
Original comment by seanc...@google.com
on 4 Dec 2014 at 5:22
Hi,
The problem hasn't resurfaced since yesterday.
Very likely reason : I wasn't refreshing the bigquery_service object I was
creating and the BigQuery Python API seems to be accepting the request, holding
it for 15 odd minutes and then forwarding the request to BigQuery. A relevant
message by the Python API would have helped narrow down the issue.
I will update this issue if I see such behavior again.
Thanks for the help.
Original comment by nir...@amplitude.com
on 5 Dec 2014 at 1:39
Ok. Closing out this issue for the time being. Definitely re-ping if you start
to see issues again.
Original comment by seanc...@google.com
on 5 Dec 2014 at 8:00
Original issue reported on code.google.com by
nir...@amplitude.com
on 3 Dec 2014 at 10:46