HEPData / hepdata

Repository for main HEPData web application
https://hepdata.net
GNU General Public License v2.0
40 stars 11 forks source link

records: make `do_finalise` more robust and tolerate multiple commit messages #761

Open GraemeWatt opened 4 months ago

GraemeWatt commented 4 months ago

A problem was recently reported with version 2 of a record which failed to display because of multiple (10!) entries in the record_version_commit_message database table with the same recid and version. There were new Sentry events: MultipleResultsFound: Multiple rows were found when exactly one was required from the line:

https://github.com/HEPData/hepdata/blob/ef2ea51e490856f27daf13b55f0f5f1f0483a0c3/hepdata/modules/records/api.py#L327

This situation could conceivably arise if do_finalise fails (for example, due to problems connecting to the database) after the commit message has been added to the database, but before setting hep_submission.overall_status = "finished". Eventually, after 10 duplicate commit messages had been added, it seems that the do_finalise function failed when reindexing.

The do_finalise function should be made more robust to catch possible exceptions due to temporary problems connecting to the database or OpenSearch, for example, by using db.session.rollback() to rollback changes to the database and by catching reindexing exceptions (similar to the try/except used for the admin_indexer immediately below). An exception should return an error to the user requesting that they try to finalise the record later. This is better than partially finalising the record with some steps missing.

Although it should not happen, the get_commit_message function could also be made more robust to tolerate multiple commit messages with the same recid and version, for example, by replacing the lines:

https://github.com/HEPData/hepdata/blob/ef2ea51e490856f27daf13b55f0f5f1f0483a0c3/hepdata/modules/records/api.py#L323-L327

with:

        commit_message_query = RecordVersionCommitMessage.query \
            .filter_by(version=ctx["version"], recid=recid).order_by(RecordVersionCommitMessage.id.desc())

        if commit_message_query.count() > 0:
            commit_message = commit_message_query.first()

to get the most recent commit message in case there are multiple entries in the database.