Closed matburt closed 5 years ago
Hit this bug
awx_1 | 13:25:22 celeryd.1 | 2018-08-27 13:25:22,366 ERROR awx.main.tasks Task awx.main.scheduler.tasks.run_task_manager encountered exception.
awx_1 | 13:25:22 celeryd.1 | Traceback (most recent call last):
awx_1 | 13:25:22 celeryd.1 | File "/venv/awx/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
awx_1 | 13:25:22 celeryd.1 | R = retval = fun(*args, **kwargs)
awx_1 | 13:25:22 celeryd.1 | File "/venv/awx/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
awx_1 | 13:25:22 celeryd.1 | return self.run(*args, **kwargs)
awx_1 | 13:25:22 celeryd.1 | File "/awx_devel/awx/main/scheduler/tasks.py", line 31, in run_task_manager
awx_1 | 13:25:22 celeryd.1 | TaskManager().schedule()
awx_1 | 13:25:22 celeryd.1 | File "/awx_devel/awx/main/scheduler/task_manager.py", line 693, in schedule
awx_1 | 13:25:22 celeryd.1 | finished_wfjs = self._schedule()
awx_1 | 13:25:22 celeryd.1 | File "/awx_devel/awx/main/scheduler/task_manager.py", line 678, in _schedule
awx_1 | 13:25:22 celeryd.1 | self.spawn_workflow_graph_jobs(running_workflow_tasks)
awx_1 | 13:25:22 celeryd.1 | File "/awx_devel/awx/main/scheduler/task_manager.py", line 197, in spawn_workflow_graph_jobs
awx_1 | 13:25:22 celeryd.1 | job.name = "{} - {}".format(job.name, spawn_node.ancestor_artifacts['job_shard'] + 1)
awx_1 | 13:25:22 celeryd.1 | UnicodeEncodeError: 'ascii' codec can't encode character u'\ud007' in position 27: ordinal not in range(128)
confirmed that bug will be resolved by
diff --git a/awx/main/scheduler/task_manager.py b/awx/main/scheduler/task_manager.py
index 4bb0c03d70..c1f71f18d1 100644
--- a/awx/main/scheduler/task_manager.py
+++ b/awx/main/scheduler/task_manager.py
@@ -194,7 +194,7 @@ class TaskManager():
kv = spawn_node.get_job_kwargs()
job = spawn_node.unified_job_template.create_unified_job(**kv)
if 'job_shard' in spawn_node.ancestor_artifacts:
- job.name = "{} - {}".format(job.name, spawn_node.ancestor_artifacts['job_shard'] + 1)
+ job.name = six.text_type("{} - {}").format(job.name, spawn_node.ancestor_artifacts['job_shard'] + 1)
job.save()
spawn_node.job = job
spawn_node.save()
Build failed.
Build failed.
Build failed.
Build failed.
Build failed.
Build failed.
Build failed.
Build failed.
recheck
Build failed.
recheck
Build failed.
@wenottingham @kialam I'm staging the changes for the rename of shard->split. I would like to cover these in a single commit (will update the QE tests at the same time), and IMO it will make the most sense to update the UI at the same time. I hope that I can cover all manual work, then do a fairly automated rename of the rest, then verify both tests passing & UI functionality efficiently.
Field or text | new value |
---|---|
job_template plus help_text |
split_job_template |
job_shard_count help_text and minimum value |
job_split_count or split_job_count minimum of 1 ✅ |
internal_limit help_text |
some help text |
sharded_jobs related link | split_jobs ✅ |
internal limit syntax shard0of3 | split1of3 (note, changing to 1 for first) ✅ |
UI: Edit the shard job template | Edit the split job template *splitting job template?? ✅ |
Shard Template | Split Template Split Job Template Splitlate ✅ |
There are relatively minor decisions remaining, but I still want to get conclusiveness on those.
recheck
Discussing with others, and the intermediate consensus is that 'split' is not the greatest terminology either.
Current suggestions:
Build succeeded.
Second attempt at an agreeable rename:
Field or text | new value | even newer value |
---|---|---|
job_template w/o help_text |
split_job_template |
job_template with help_text |
job_shard_count help_text and minimum value |
job_split_count or split_job_count minimum of 1 ✅ |
job_slice_count minimum of 1 |
internal_limit help_text |
some help text | replaced by job_slice_count (WJ & J) job_slice_number (J only) |
sharded_jobs related link | split_jobs ✅ | slice_workflow_jobs ?? (maybe remove) |
internal limit syntax shard0of3 | split0of3 ✅ (change to 1 for first) ❌ | slice1of3 |
UI: Edit the shard job template | Edit the split job template ✅ *splitting job template?? | Edit the slice job template |
Shard Template | Split Template Split Job Template Splitlate ✅ | Slice Job Template |
Scheme for the job & workflow job field names:
# job serialization
{
"id": 11,
"type": "job",
"url": "/api/v2/jobs/11/",
"related": {...},
"summary_fields": {...},
...
"job_slice_number": 2,
"job_slice_count": 5
},
# workflow job serialization
{
"id": 11,
"type": "workflow_job",
"url": "/api/v2/workflow_jobs/11/",
"related": {...},
"summary_fields": {...},
...
"job_slice_count": 5
}
The related link (previously sharded_jobs
, then split_jobs
) would presumably have existed for the UI to use in parallel to the job template RECENT JOBS tab. That's still obtainable through a query filter, and the UI doesn't have any plans of adding such a thing to my knowledge. Maybe we just delete this.
"replaced by job_slice_count (WJ & J) job_slice_number (J only)"
I thought we were not having the count on the individual slices?
What is the field/naming for how it is split - does the template define the number of slices, or the size of any one slice?
oh, right, I was forgetting about the confusion with non-sliced workflow jobs having an extra integer which would be confusingly 1. The other option for workflow jobs we talked about was to have a boolean field is_sliced_job
. That's fine with me, but increasingly I am wanting the count to be on the job record as well.
I've identified a bug where hosts that are in groups in an inventory will be targeted in every s(lice|plit|hard).
Steps to reproduce:
I figured this out because I was doing some exploratory testing with user-supplied limits- random hosts were failing when the job was sharded and I used this playbook to identify them
---
- hosts: all
gather_facts: false
tasks:
- fail:
when: ansible_connection != 'local'
Build succeeded.
Update: I will keep to the plan of using the is_sliced_job
field. This means that orphaned sliced workflow jobs cannot be relaunched. This seems sensible after I thought about it.
Build failed.
Build failed.
Build succeeded.
@kialam It looks like we need to figure out how to integrate these commits into here.
https://github.com/matburt/awx/pull/3
The things still on my agenda with relevance to this branch:
Build failed.
Build failed.
Build succeeded.
Build succeeded.
Build succeeded.
Build failed.
recheck
Build succeeded.
Build succeeded.
A couple of pre-merge checks...
bash-4.2$ awx-manage makemigrations
No changes detected
Good with migrations.
Tests are looking good.
Will push the new tooltip text and rebase when I have agreement on that.
notes for docs impact:
recheck
Build succeeded.
Build succeeded.
Build failed.
recheck
Build failed.
recheck
Build failed.
recheck
Build failed.
This is a work-in-progress, see https://github.com/ansible/awx/issues/1283