Open aborschev opened 1 year ago
That's a good comment, thanks. Most of the data mentioned is not available yet in pgcopydb though, and require extra setup and computations to fetch:
It's not trivial to guess the exact impact of computing all those operations. As timings are already captured, I could also add a command to get a summary of the average times to build indexes (and rates), and I suspect adding the buffer size capture and computation to our COPY code would not slow it down too much.
Contributions welcome, with an eye to do the easy bits (in terms of perf impact) first!
Hmm, Your proposal with counting bytes during copy-in-copy-out is precise, but looks too complicated. I meant much simpler way: like accumulate pg_relation_size() for each copied table, divide by time (COPY cumulative metric) and get average data transfer rate. One figure for the whole copying session would be enough here. Same logic - with indexes.
P.S.
The practical question behind this: If we've got for example 1 Gbit network, which % of bandwidth we've utilized?
Do we need to increase\decrease number of copy jobs to utilize bandwidth, but avoid contention?
But it does not work that way. The amount of bytes sent through the network is not the same as the pg_relation_size()
on the source, because of bloat (tuples on-disk that are not visible to our transaction). And the size sent over the network also includes network protocol metadata, whereas the pg_relation_size()
on the target includes storage metadata of the target Postgres version. So transfer rate can NOT be computed from the on-disk size of the relation, which is going to be different on source and target even with the same exact data set.
Then on-top of that, because of the concurrency aspects of pgcopydb, if your question is one of bandwidth capacity, it becomes very hard to answer that with numbers computed by individual processes. You'd be better off with OS-level tooling here, such as dstat for instance.
It would be nice to add to "Finals" table below the information about:
This is low-priority wish, just for convenience.