alphagov / content-publisher

Publishing application for GOV.UK
https://docs.publishing.service.gov.uk/apps/content-publisher.html
MIT License
33 stars 11 forks source link

Correct timestamp on "First Published" change note for a backdated document #1147

Open kevindew opened 5 years ago

kevindew commented 5 years ago

We have learnt that we will hit an issue with our work on the backdating functionality.

In Whitehall when you backdate a document the initial change note is set to match the first_published_at date of the document. For example, if I publish a document on Whitehall backdated to 9th June 2016 the change_history is: [{"public_timestamp":"2016-06-09T10:00:00.000+01:00","note":"First published."}]

Full Content Store output ```json { "analytics_identifier":null, "base_path":"/government/news/testing-backdating", "content_id":"2f46cfcd-5006-4b99-a1ae-bb421a896bd8", "document_type":"news_story", "first_published_at":"2016-06-09T09:00:00.000+00:00", "locale":"en", "phase":"live", "public_updated_at":"2016-06-09T09:00:00.000+00:00", "publishing_app":"whitehall", "publishing_scheduled_at":null, "rendering_app":"government-frontend", "scheduled_publishing_delay_seconds":null, "schema_name":"news_article", "title":"Testing backdating", "updated_at":"2019-06-19T10:21:38.135Z", "withdrawn_notice":{ }, "publishing_request_id":"13600-1560939697.689-10.1.1.184-1985", "links":{ "organisations":[ { "analytics_identifier":"OT1056", "api_path":"/api/content/government/organisations/government-digital-service", "base_path":"/government/organisations/government-digital-service", "content_id":"af07d5a5-df63-4ddc-9383-6a666845ebe9", "description":"We help government work better for everyone by leading digital transformation. GDS is part of the Cabinet Office .", "document_type":"organisation", "locale":"en", "schema_name":"organisation", "title":"Government Digital Service", "withdrawn":false, "details":{ "logo":{ "formatted_title":"Government Digital Service", "crest":"single-identity" }, "brand":"cabinet-office", "default_news_image":null }, "links":{ }, "api_url":"https://www.integration.publishing.service.gov.uk/api/content/government/organisations/government-digital-service", "web_url":"https://www.integration.publishing.service.gov.uk/government/organisations/government-digital-service" } ], "original_primary_publishing_organisation":[ { "analytics_identifier":"OT1056", "api_path":"/api/content/government/organisations/government-digital-service", "base_path":"/government/organisations/government-digital-service", "content_id":"af07d5a5-df63-4ddc-9383-6a666845ebe9", "description":"We help government work better for everyone by leading digital transformation. GDS is part of the Cabinet Office .", "document_type":"organisation", "locale":"en", "schema_name":"organisation", "title":"Government Digital Service", "withdrawn":false, "details":{ "logo":{ "formatted_title":"Government Digital Service", "crest":"single-identity" }, "brand":"cabinet-office", "default_news_image":null }, "links":{ }, "api_url":"https://www.integration.publishing.service.gov.uk/api/content/government/organisations/government-digital-service", "web_url":"https://www.integration.publishing.service.gov.uk/government/organisations/government-digital-service" } ], "primary_publishing_organisation":[ { "analytics_identifier":"OT1056", "api_path":"/api/content/government/organisations/government-digital-service", "base_path":"/government/organisations/government-digital-service", "content_id":"af07d5a5-df63-4ddc-9383-6a666845ebe9", "description":"We help government work better for everyone by leading digital transformation. GDS is part of the Cabinet Office .", "document_type":"organisation", "locale":"en", "schema_name":"organisation", "title":"Government Digital Service", "withdrawn":false, "details":{ "logo":{ "formatted_title":"Government Digital Service", "crest":"single-identity" }, "brand":"cabinet-office", "default_news_image":null }, "links":{ }, "api_url":"https://www.integration.publishing.service.gov.uk/api/content/government/organisations/government-digital-service", "web_url":"https://www.integration.publishing.service.gov.uk/government/organisations/government-digital-service" } ], "taxons":[ { "api_path":"/api/content/business-and-industry", "base_path":"/business-and-industry", "content_id":"495afdb6-47be-4df1-8b38-91c8adb1eefc", "description":"", "document_type":"taxon", "locale":"en", "public_updated_at":"2018-09-03T15:41:02Z", "schema_name":"taxon", "title":"Business and industry", "withdrawn":false, "details":{ "internal_name":"Business and industry", "notes_for_editors":"", "visible_to_departmental_editors":true }, "phase":"live", "links":{ "root_taxon":[ { "api_path":"/api/content/", "base_path":"/", "content_id":"f3bbdec2-0e62-4520-a7fd-6ffd5d36e03a", "description":"", "document_type":"homepage", "locale":"en", "public_updated_at":"2019-06-18T14:52:01Z", "schema_name":"homepage", "title":"GOV.UK homepage", "withdrawn":false, "links":{ }, "api_url":"https://www.integration.publishing.service.gov.uk/api/content/", "web_url":"https://www.integration.publishing.service.gov.uk/" } ] }, "api_url":"https://www.integration.publishing.service.gov.uk/api/content/business-and-industry", "web_url":"https://www.integration.publishing.service.gov.uk/business-and-industry" } ], "available_translations":[ { "title":"Testing backdating", "public_updated_at":"2016-06-09T09:00:00Z", "document_type":"news_story", "schema_name":"news_article", "base_path":"/government/news/testing-backdating", "description":"Testing backdating", "api_path":"/api/content/government/news/testing-backdating", "withdrawn":false, "content_id":"2f46cfcd-5006-4b99-a1ae-bb421a896bd8", "locale":"en", "api_url":"https://www.integration.publishing.service.gov.uk/api/content/government/news/testing-backdating", "web_url":"https://www.integration.publishing.service.gov.uk/government/news/testing-backdating", "links":{ } } ] }, "description":"Testing backdating", "details":{ "body":"\u003cdiv class=\"govspeak\"\u003e\u003cp\u003eTesting backdating\u003c/p\u003e\n\u003c/div\u003e", "emphasised_organisations":[ "af07d5a5-df63-4ddc-9383-6a666845ebe9" ], "change_history":[ { "public_timestamp":"2016-06-09T10:00:00.000+01:00", "note":"First published." } ], "image":{ "high_resolution_url":"https://assets.integration.publishing.service.gov.uk/government/assets/placeholder-a6e35fc15d4bfcc9d0781f3888dc548b39288f0fe10a65a4b51fef603540b0c5.jpg", "url":"https://assets.integration.publishing.service.gov.uk/government/assets/placeholder-a6e35fc15d4bfcc9d0781f3888dc548b39288f0fe10a65a4b51fef603540b0c5.jpg", "caption":null, "alt_text":"placeholder" }, "first_public_at":"2016-06-09T10:00:00.000+01:00", "political":false, "government":{ "title":"2015 Conservative government", "slug":"2015-conservative-government", "current":true }, "tags":{ "browse_pages":[ ], "policies":[ ], "topics":[ ] } } } ```

However, if you send the Publishing API a change_note rather than a change_history it will create the change note to be the current time and not acknowledge the backdating. This would mean that for a document backdated to 9th June 2016 that is published today the change_history would be: [{"public_timestamp":"2019-06-19T10:21:38.135Z","note":"First published."}].

The expected/desired behaviour is as follows: when an edition is published with a backdate the first change note is expected to match the date of that backdating. There should not be change notes with dates preceding the "First published" one .

This issue is related to other problems we have with change history and backdating. One problem is that the Publishing API approach to change history makes it impossible to backdate after the first edition without making the change history inconsistent (the "First published." change note would have a different time). Another is that not all Whitehall document history is captured in Publishing API change notes, nor is it compatible, which means that in migrating content from Whitehall to Content Publisher we will have to find a means to use the Whitehall change history. Finally, a common second line problem is editing a change note entered or published incorrectly, it would be good if this could be resolved by publishers.

Given these other issues I consider the solution to this to be on different levels:

Options identified as approaches to solve this are:

1. Generation of a change note in Publishing API uses some timestamp from the payload

This has an advantage of simplicity and a downside that it is only a first level solution.

2. Use Whitehall style approach to pass whole change history

This has the advantage that it is a third level solution. It has a downside that the publishing application is responsible for sending all the timestamps which means that a put_content Publishing API request must be done immediately before a publish.

3. Hybrid where we can pass a change_history and a change note

This has the advantage that it is a third level solution and that it can avoid a put_content before a publish request. It has downsides that it would be a new pattern in the Publishing API and may be difficult to consolidate with the existing patterns for change_note/change_history

4. Asking should a backdated document even have a change note of "First published."

This is more about questioning whether a "First published." change note for a backdated document even makes sense since all other changes on the original publishing are lost. This has risks that it will increase the scope of the problem, lead to inconclusive results, change a GOV.UK frontend pattern and/or delay the completion of this feature.

benthorner commented 5 years ago

Thanks @kevindew for the clear and thorough description of the problem. I notice we don't mention the final problem - of editing incorrect change notes - in the proposed solutions. The main issue here seems to be that, unless we manually pass all the change notes, the app has no way of 'grabbing hold' of an old change note in order to correct it.

An interesting question here is: what time do we expect a change note to have in general? At the moment, it looks like it's meant to be the time at the point of publish, which certainly makes sense for 'first published'. Otherwise, I guess we would like it to be roughly consistent with the time when email notifications are sent out, and the other times for the edition.

benthorner commented 5 years ago

The main issue here seems to be that, unless we manually pass all the change notes, the app has no way of 'grabbing hold' of an old change note in order to correct it.

Perhaps a similar-ish approach as for topics would work?

kevindew commented 5 years ago

An interesting question here is: what time do we expect a change note to have in general? At the moment, it looks like it's meant to be the time at the point of publish, which certainly makes sense for 'first published'. Otherwise, I guess we would like it to be roughly consistent with the time when email notifications are sent out, and the other times for the edition.

The change note should be dated exactly the same as the public_updated_at value for a particular edition. Having pondered that a while I don't think there is a realistic scenario that this should not be the case, as it would always cause an inconsistent page.

Therefore I'm pretty confident that we could resolve the first level by updating https://github.com/alphagov/publishing-api/blob/master/app/models/change_note.rb#L29-L37 to use public_updated_at in Publishing API, and that could be considered a bug fix.


Thanks @kevindew for the clear and thorough description of the problem. I notice we don't mention the final problem - of editing incorrect change notes - in the proposed solutions. The main issue here seems to be that, unless we manually pass all the change notes, the app has no way of 'grabbing hold' of an old change note in order to correct it.

Yes, I consider that last problem mostly out of scope for something we have to solve now but something to bear in mind in modelling.

What I anticipate is that in Content Publisher a document could have a collection of change notes for past editions which is sent to Publishing API and appended to with each publishing. One thing we'd also need to be able to do this is know the date that Publishing API publishes something. I'm not sure if that's returned from a publish response or not.


One thing I'm currently considering is that Publishing API doesn't need to create a change note from change history as I doubt anything makes use of that model. If we were to remove that aspect of it we could combine the presence of a change_note and of change_history at presentation layer rather than doing an or

benthorner commented 5 years ago

I have just one more approach to suggest here, which would be a third-level solution and also extends to editing arbitrary change notes in the future. Bullet points seems the best way to explain:

Assuming the above can be made to work, backdating the first change note can be achieved by sending it in the change history hash, with a matching 'ID', but the backdated time.

The overwriting mechanism could do with some clarification, as there are a few scenarios. For example, let's say we had the following existing change notes on a document.

  1. public_timestamp: 2019-01-03, note: First published, ID: id1
  2. public_timestamp: 2019-01-04, note: Blah, ID: id2

If we wanted to overwrite the timestamp for the first note, we could send { public_timestamp: 1989-01-01, ID: id1 } in the change history for the current edition. Other scenarios:

The mechanism - whether using the bulk details hash or as a single note - should support the same fields. If both mechanisms are used and the same ID appears in each, then it's a 4xx error.

It's worth pointing out that we wouldn't need to store any additional data about change notes in Content Publisher in order to use this proposal; we only need to be able to 'grab hold' of the first change note, which can be done by re-using a hard-coded ID from when we generate it. We could support changing arbitrary change notes if we sent and stored an ID for each one in the app.

Pros: no extra requests, no impact to existing apps, support for migration, support for future editing, low coupling to Publishing API behaviour

Cons: optional IDs might be hard to understand, time required to make the changes, edge cases I haven't thought of

benthorner commented 5 years ago

Correction: by 'create', I mean 'present to the Content Store as'. I'm not really sure if we want to actually update the change notes in the Publishing API DB once they've been published; this is more of a presentation override mechanism, really.

kevindew commented 5 years ago

Thanks @benthorner - really interesting take on it 👍it's presented well and the use of an id could be a useful consolidation means between Content Publisher and Publishing API. I have a few questions to understand it further. As discussed earlier we are going to proceed with fixing public_updated_at on change note as an immediate step but it's good to keep this conversation going as it's very likely necessary to resolve this for Whitehall attachments.


So what I understand that the main benefit of this suggestion is that we send less verbose messaging to the Publishing API of what the change history is. I'm a bit unsure about:

It's worth pointing out that we wouldn't need to store any additional data about change notes in Content Publisher in order to use this proposal; we only need to be able to 'grab hold' of the first change note, which can be done by re-using a hard-coded ID from when we generate it. We could support changing arbitrary change notes if we sent and stored an ID for each one in the app.

I'm not sure how that means we don't save additional data. We'd need to store the users edit to a change note somewhere and it probably doesn't make sense to update a superseded editions revision.

If we send a change history to Publishing API of { note: Bar, ID: id3 } is that in a put_content request and using the change_history field? Is that saved on the model as the change_history field of details verbatim or does a transformation occur?

If I changed a change note on edition 2 and then changed the change note on edition 3 would the put change history payload include both edited changes?

benthorner commented 5 years ago

@kevindew thanks - it's a tricky one to think about how this works with the old and the proposed behaviour. To answer your first question: the comment you've quoted is talking specifically about backdating, and how we could support editing it in subsequent editions. Once we've generated the initial "First published" change note, we've got a problem that there's currently no way to tell the Publishing API to update that specific change note. If we were only concerned with 'grabbing hold' of this change note, it would be sufficient to hard-code an ID on the one we send initially, so that we can later send an updated version in the change_history that would overwrite the original date at the point of presentation - as above, still not sure if it should actually modify the one stored in the DB.

I think that mostly answers your second question. My thought was that we support this behaviour solely at the point of presentation, leaving the stored data untouched. This means an overwrites would have to be sent in any subsequent editions, which seems fine for backdating, and matches the behaviour of Whitehall for any other edit way might do.

That somewhat answers your third question: no, it would only include the most recent version of the change note in the change history payload, assuming we had sufficient modelling to store the ID of the original change note, so that we had a way to 'grab hold' of it in order to make the change in edition 2 and edition 3 in your example.

One issue I can see with this approach would be the current behaviour of extracting a change note out of the change history. If we're planning to intelligently merge stored notes with the change history, without an ID to disambiguate them, we would end up with duplicates. While I agree we should stop doing this, as it does indeed look pointless, we would need to provide some 'legacy support' for this, perhaps by just detecting such duplicates and removing them.

One other decision with editing arbitrary change notes is whether the Publishing API should automatically update the timestamp of edited change notes (at the point of publishing). I'm guessing not, but it's something else to bear in mind.

A simple rule of thumb that may be helpful to understand all this: we should only use change history to override (or possibly supplement) change notes created with the normal process. In that way, we would expect all (most) change notes in the change history to have an ID, since their purpose is to override another change note.

kevindew commented 5 years ago

Thanks for the explanation @benthorner - as discussed face-to-face we're moving with a basic level one fix for now and will resume work on this as part of the Whitehall migration

On a sidenote, one thing I thought about the other day is that if we want to allow users the ability to remove a backdate on a subsequent edition Content Publisher would need to store a timestamp of when the document was first_published. We have a first_published_at field that can cover this however it wouldn't be exactly identical to what Publishing API produces.

benthorner commented 5 years ago

@kevindew that's annoying. On the one hand, it looks like our local field is only used for the link to Content Data, but on the other hand, populating this field using the first publish response would be more accurate and also solve the problem here; what seems harder is knowing when we explicitly need to send it - if we use the aforementioned approach, then we could send it with every edition without it changing between them. Annoying, the Publishing API does seem to store the actual publish time in a field called 'temporary_first_published_at', but there's no way to make use of it and indeed the field itself doesn't seem to be used anywhere in the codebase :-|.

kevindew commented 5 years ago

@benthorner Yup that field is a piece of work I was involved in that was paused due to a mission shutdown and has remained in an incomplete state 😭 One day It will be finished.