Closed ghukill closed 5 years ago
Progress:
core.models
and core.spark.jobs
Need to address unique_in_published
field in Record
Fix "published" column in Organization page
Logistically, mostly complete.
What remains unresolved are slow MySQL queries for setting Records as published, and setting their publish_set_id
. For most Jobs, it's very quick. But large Jobs, at least one 3.5 million record Job, was prohibitively slow.
Keeping issue open.
Needs updates to documentation...
Done.
Testing in the 1,3,6,12 million Record ranges have shown some scaling issues at various loactions:
job_details
published
index?publish_set_id
publish_set_id
to Record levelFurthermore, have an instance of Combine with very different types of Jobs / Records. Where the "Published" Records section used to feel like the place to view all Records that were on the way out, and had some affinity, now it feels artificial to have an ES index where they are mixed. If Jobs 1-4 are for purpose Foo, and Jobs 5-8 are for purpose Bar, why have an ES index combining these completely unrelated Jobs's mapped fields? It doesn't scale logistically, and it doesn't scale conceptually.
All this suggests reworking publishing. In many ways, simplifying it. The goal will be to publish at the Job level, not the RecordGroup level. Unsure if
JobPublish
model is still needed, that formally united RecordGroup and Job. Might be sufficient to just set flag on Job.When a Job is published:
publish_set_id
for each Record (does Job need one?)published
flag for JobWhen that's done, easy to see what Jobs are published for a Record Group by looking for Jobs with
published
== True.Then, for all published Jobs, do the same.
Published Records page will show:
Planning on removing
PublishJob
s entirely then, as it will become a "state" of other Jobs if they are published or not. Might consider turning the Job blue if published, but not much beyond that.