NHSDigital / DataDictionaryPublication

Apache License 2.0
7 stars 1 forks source link

Broken Link Checker #310

Open AngelaFaulding opened 2 years ago

AngelaFaulding commented 2 years ago

A broken link checker used to be run on the NHS Data Model and Dictionary website. Any broken links could then be updated. The broken link checker no longer runs so there is currently no way of identifying broken links until they are spotted by a member of the team or via the helpdesk. There may be a large number of broken links this year following the NHS Digital / NHS X / NHS England merge. James said he may be able to incorporate a broken link checker into Mauro.

AngelaFaulding commented 2 years ago

To be looked at Post Migration.

AngelaFaulding commented 2 years ago

James has been able to do this but it wouldn't work in the meeting on 18/1/22. Next meeting planned for 25/1/22.

GSChandel commented 2 years ago

James - It's working. DD Team to test and confirm.

AngelaFaulding commented 2 years ago

@jamesrwelch - there are 93 items listed with broken links. Some of these links to the data set menus. Do you need to do something else with this? image

@PaulChapmanPM - if this needs fixing, it will need a dev label.

AngelaFaulding commented 2 years ago

This will only be resolved when #392 is fixed.

jamesrwelch commented 2 years ago

392 is fixed, and after a quick tidy, I've removed all except one of the broken links. The last page with broken links is the "Supporting Definitions Menu" in the Supporting Information - we can discuss this one later.

AngelaFaulding commented 2 years ago

The latest release now has only one failure in the Orchestrator which relates to #451

AngelaFaulding commented 3 months ago

The Broken Link Checker has been removed from the latest version of the Orchestrator. Can this please be reinstated as it was originally agreed and working.

July 2024:

image

Previous version sorted by James:

image

pjmonks commented 3 months ago

For clarification, System C are not responsible for a broken link checker integrity check. This is not mentioned in our statement of work to complete, so we consider this out of scope.

However, this may be partially related (unless the broken link checker refers to something else). We do have a criteria to uphold here:

ED10.1 Ability created to rename a data item and have that change be reflected in any links recorded in other data item descriptions.

By renaming an item in a branch, there could then be the potential for hyperlinks in other description fields to become broken since the name in item paths have changed. This is in progress, currently in a testing state, as described in this pull request: MauroDataMapper-NHSD/mdm-plugin-nhs-data-dictionary#50

For this use case, System C are working on maintaining links to other items so there are no broken links. Anything beyond this is out of scope.

AngelaFaulding commented 3 months ago

16/07/2024 - James has tested this as part of the Orchestrator and has sent it to System C to check. Peter will deploy if it works on the server.

AngelaFaulding commented 3 months ago

When clicking on an item identified as having a broken link, the following is displayed. This appears to be just for Terms, i.e. the NHS Business Definitions. The links to the Data Elements and Attributes work.

image

The text is:

Not Found

We're sorry, but the server returned a 'Not Found' error

Details

{
  "headers": {
    "normalizedNames": {},
    "lazyUpdate": null
  },
  "status": 404,
  "statusText": "OK",
  "url": "https://mauro.dev.dataproducts.nhs.uk/api/terminologies/null",
  "ok": false,
  "name": "HttpErrorResponse",
  "message": "Http failure response for https://mauro.dev.dataproducts.nhs.uk/api/terminologies/null: 404 OK",
  "error": {
    "path": "/api/terminologies/null",
    "resource": "Terminology",
    "id": "null"
  }
}

Also, I have check ed a few of the items and they do not contain a broken link, for example: https://www.datadictionary.nhs.uk/nhs_business_definitions/british_hiv_association.html

pjmonks commented 3 months ago

Please could you identify which item you were intending to view in Mauro? The error message is not descriptive enough to pinpoint how the error happened.

I cannot speak to the broken link checker as that is the responsibility of @jamesrwelch , however it might be possible that links to external sites may work but only because they are redirected, you may be using an old link that just happens to work because the target website automatically redirects you to the updated page. I suppose it is up to you whether you consider those links "broken" or at least maintained.

AngelaFaulding commented 3 months ago

I just clicked from the top down and all NHS Business Definitions failed.

jamesrwelch commented 3 months ago

Also worth noting that links can be broken temporarily - it checks for any 4** error response code which can include time-outs, rate-limits, etc.

AngelaFaulding commented 3 months ago

Thanks @jamesrwelch - I have seen that before. So we would check the link and if it is OK, we wouldn't have to do anything with it.

The NHS Business Definitions not displaying is more of an issue.

jamesrwelch commented 3 months ago

Yes, given the age of some of the websites that are being pointed at, I wouldn't be surprised if you got a few that fail every once in a while. If you get one that persistently fails in the integrity checker but works when you load it manually (or vice-versa), then we can take a closer look - for speed reasons the way we check for broken links is slightly different to loading it in a browser. The links to business definitions being incorrect should be an easy fix that I'll look into.

pjmonks commented 3 months ago

Please could you identify which item you were intending to view in Mauro? The error message is not descriptive enough to pinpoint how the error happened.

I mean which item from integrity check list were you clicking to visit in Mauro. I'm assuming you are:

  1. Running the integrity checks.
  2. They complete and show the "Broken links" section with a list of dictionary items that supposedly contain broken links.
  3. You can clicked on of them to view and cannot find it in Mauro. Which of those items did you click on?
AngelaFaulding commented 3 months ago

I ran the integrity check for CR1815 and this is the Orchestrator page:

image

I clicked on each NHS Business Definition and got:

image

pjmonks commented 3 months ago

I found the issue is not the broken link checker itself, the problem is the hyperlinks in the Orchestrator UI, which will be simple to fix. I've created this issue for tracking progress:

pjmonks commented 3 months ago

The hyperlinks in the integrity checker have been fixed now and deployed to test.