Open nvanbenschoten opened 4 years ago
I agree that this isn't ideal. The jobs infrastructure currently doesn't support reporting an unknown fraction progressed. @spaskob do you think this is something we could support? Do you know how other types of jobs handle this when the fraction progressed is unknown?
I actually have not worked on the fraction progressed reporting. It may be better to direct this question to bulk-io folks who probably have similar problems with reliably reporting progress on import and backup jobs. Please talk to @dt and/or @pbardea .
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
The
sampleAggregator
does not report a fraction completed if the number of rows it expects to find is 0, based on its previous set of table statistics.See
https://github.com/cockroachdb/cockroach/blob/565ffce1fa582164ac99331691f3d3c80b15c918/pkg/sql/execinfrapb/processors_table_stats.proto#L141-L144
and
https://github.com/cockroachdb/cockroach/blob/565ffce1fa582164ac99331691f3d3c80b15c918/pkg/sql/rowexec/sample_aggregator.go#L216-L224
I just ran into this and was very confused. I expect users will too. The reason why this came off as confusing is because I was running stats immediately after a large
IMPORT INTO
(which seems quite common). This had the effect of all but the largest table already having a set of non-zero table statistics. However, the largest table, which took the longest to import, did not. So all stats creations completed quickly except for on the last table, which took the longest because it was the largest and never had any progress reported. For a while, I thought the job was stuck. I ended up digging into stacktraces trying to find where it was stuck for a bit until realizing what was going on.Here's how the jobs page looked about an hour in, after all but the last two tables had completed.
Eventually, the stats completed, but without ever giving me progress, which I guess is expected based on the code.
We should be able to do something better here. Can we mark the progress as indeterminate instead of leaving it at zero? Or say "unknown remaining". If I saw this as a customer, I would have thought this was a bug and filed a support issue.
cc. @rytaft @awoods187
Jira issue: CRDB-4163