CDLUC3 / ezid

CDLUC3 ezid
MIT License
10 stars 4 forks source link

Rationalize updateTime timestamps for SearchIdentifier #730

Open sfisher opened 2 months ago

sfisher commented 2 months ago

I think this is low priority.

See https://github.com/CDLUC3/ezid/issues/618 for context.

It seems that updateTime isn't updated when the database records for sarchIdentifier are updated by the linkChecker.

We can analyze how this field is used and decide if it should be updated, if we should store real updates elsewhere or just forget about it.

The original problem will pretty much go away after a reindex since the code going forward in the OpenSearch release updates both DB and OpenSearch from the linkChecker code rather than only the DB which was the case before the release of OS code.

jsjiang commented 2 months ago

The updatetime field in the SearchIdentifier table may still be used by the OpenSearch index building tool to update the index by updated time. Modify the proc-link-chedker-update.py script to update the updatetime field when the linkIsBroken and hasIssues fields are updated.

The related code is in the run() function:

    si2.linkIsBroken = newValue
    si2.computeHasIssues()
    si2.save(update_fields=["linkIsBroken", "hasIssues"])
    open_s = OpenSearchDoc(identifier=si2)
    open_s.update_link_issues(link_is_broken=si2.linkIsBroken, has_issues=si2.hasIssues)

The updatetime field can be set to:

t = int(time.time())