Is your feature request related to a problem? Please describe.
It's very hard to tell what's happening when an index is being created.
First, the progress bar on the jobs table does not appear to be very useful. We observed a customer case where the index creation was kicked off, and storage steadily increased for 8 hours while the job was running, from 8GiB total to 13+GiB. During this time, the index creation job stayed at 0% progress.
So second, I went to the logging to try to confirm exactly what the index was doing and confirm that it was behaving properly. There appeared to be no logging describing what the job was actually doing. These are the only logs:
Nov 19 08:14: queued new schema-change job <x> for table <a>, mutation <b>
Nov 19 08:14: job <x>: resuming execution
Nov 19 08:14: SCHEMA CHANGE job <x>: stepping through state running
Nov 19 09:36: waited for 3 [<x> <y> <x>] queued jobs to complete 1h21m40.615066558s
Nov 19 15:02: job <x>: pause requested recorded with reason
Nov 19 15:02: job <x>: adoption completed with error ‹×›
Nov 19 15:02 job <x>, session <z>: paused
I'll note that the waited for 3 jobs to complete log appeared to be self-referential. Two of the job IDs listed here were the same job ID as the index creation job - the other one was another SCHEMA CHANGE job that had the exact same waited for 3 jobs to complete log, with references to this original schema change job and itself. I don't know what this log is trying to tell us.
And then we don't get any progress output until the job is paused at 15:02.
adoption paused with error appears misleading - the job hasn't errored out or failed, it's still paused in fact.
Describe the solution you'd like
A better progress bar
More logging output. Perhaps it can be gated behind a verbosity level, if the concern is that it would spit out too much information. This verbosity level would allow you to see whenever the index backfill actually writes to ranges corresponding to the new ranges, and whatever else the index does from start to finish.
Describe alternatives you've considered
With no logging and no reliable progress indicator, I'm not aware of any alternatives except watching the capacity-used metric going up without context.
Is your feature request related to a problem? Please describe.
It's very hard to tell what's happening when an index is being created.
First, the progress bar on the jobs table does not appear to be very useful. We observed a customer case where the index creation was kicked off, and storage steadily increased for 8 hours while the job was running, from 8GiB total to 13+GiB. During this time, the index creation job stayed at 0% progress.
So second, I went to the logging to try to confirm exactly what the index was doing and confirm that it was behaving properly. There appeared to be no logging describing what the job was actually doing. These are the only logs:
I'll note that the
waited for 3 jobs to complete
log appeared to be self-referential. Two of the job IDs listed here were the same job ID as the index creation job - the other one was anotherSCHEMA CHANGE job
that had the exact samewaited for 3 jobs to complete
log, with references to this original schema change job and itself. I don't know what this log is trying to tell us.And then we don't get any progress output until the job is paused at 15:02.
adoption paused with error
appears misleading - the job hasn't errored out or failed, it's still paused in fact.Describe the solution you'd like
Describe alternatives you've considered
With no logging and no reliable progress indicator, I'm not aware of any alternatives except watching the capacity-used metric going up without context.
Jira issue: CRDB-44773