Closed github-actions[bot] closed 5 months ago
Starting to follow this up a bit. Some questions:
I see there are 2 options of interest, exclude
and timeout
. Default timeout is 20s, which is already fairly long, and I tend to find it reasonable considering what I believe to be normal human user behaviour. For the moment I'll try excluding the domains in question. Then I think, beyond the question of what stance we adopt toward those domains, the only remaining "real" issue would be the link in constitution.md.
Dear Daniel @erget
Thanks for following this up. Yes, we imported the mailman archive. Our copy is linked on the discussion page on the same line, and in front of, the UCAR original. We can fix those missing links by pointing to the appropriate place in our copy, but unfortunately this will take a bit of work because our copy is not grouped into years.
I don't know why the link to the CEDA cf-checker isn't working, but @RosalynHatcher probably could advise.
I don't know why the CEDA editor isn't working. It has worked in the past, although it's always rather slow to answer that query. Maybe Alison @japamment could comment?
Best wishes for 2023
Jonathan
I agree that the weekly repetition of the same broken links has been a nuisance and thanks for stopping it. But we mustn't forget to fix them! We should keep this issue open until it's done.
@JonathanGregory I agree in principle. I've now fixed what I think is everything in #320 . What this does:
main
and periodically via crontab.lycheeignore
file.So I propose we merge that PR, close this issue, plus the other one related to link checking, and open one to get address the longer-term issues of migrating the mailman archive and figuring what's up with the domains that are timing out. What do you think?
Never mind... Something's not right with how the job isn't failing when I expect it to. I'll need some help to finalise this. Will request that in the requisite PR.
I've closed https://github.com/cf-convention/cf-convention.github.io/issues/330 because it was the same errors as this one.
In issue 345, which is the most recent output of the cron job, @DocOtak wrote
@JonathanGregory @erget Do we want to disable the cron tasks for this?
I've been closing the new ones (like 345) every Monday morning, as a human cron daemon. I don't mind doing that, but equally I don't think it helps to have a new one every week until we've fixed the missing links identified by this edition.
@JonathanGregory @DocOtak I agree, actually we could disable this until we get it fixed - we've made progress on it but slowly ;) @DocOtak do you have the rights to disable the cron task, and could it be executed manually in that case?
As we have probably fixed all the recurrent broken links on the website, we don't need to disable the link-checker, as discussed in this issue. I will therefore close this issue, and we will see what the link-checker has to say when it next executes.
I'm reopening this because Antonio @cofinoa has reinitiated work on it in https://github.com/cf-convention/cf-convention.github.io/pull/320. Thanks, Antonio and others. It'll be good not to have to close an issue every Monday morning. :smile: Also, I'm closing https://github.com/cf-convention/cf-convention.github.io/issues/447, which deals with the same issue.
@erget @JonathanGregory and @sadielbartholomew I have come back to the link checker errors and problems. Sorry if a miss some already open issue related.
I will merge #320 .... but that doesn't fix all missing/timeout/vanished (broken) links.
I'm preparing and PR to fix/improve the link checker and try silent some permanent broken links and old documents with wrong UTF-8 encoding where link checker fails with an error.
Thanks for working on this, @cofinoa.
For reference I am copying here various comments from https://github.com/cf-convention/cf-convention.github.io/pull/486
@cofinoa https://github.com/cf-convention/cf-convention.github.io/pull/486#issuecomment-2075566036
Dear @cf-convention/info-mgmt team, this PR relates to long-standing issue with the link checker, see: #318 #320
This is a first step to fix the issues with the link checker when PR are made.
The action is triggered when PR are open/re-open and
first check if all **/*.md
files in the repo has no broken links, if so the action fails, in the summary of the action run, the resulting output can be seen, including when "everything" is "green". From https://github.com/cf-convention/cf-convention.github.io/actions/runs/8821086583?pr=486 :
Summary
Status Count 🔍 Total 593 ✅ Successful 495 ⏳ Timeouts 0 🔀 Redirected 0 👻 Excluded 98 ❓ Unknown 0 🚫 Errors 0
.html
files, and change them to the actual .md
which is what should be referenced in .md
files instead, see commit 468decevocabularies.md
has been tricky because the md
file uses HTML
tags..md
are ok, the action checks that building the site with Jekyll works and, if so, it uploads the artifact.Please take a moment to review and let me know if this fits. If so, I will continue with the PR to incorporate the link check of the site at regular basis (i.e. cron job every Monday), or just we can merge this PR and open a new one PR for the that.
PS: annotation of the PR with a comment with the link check report it's a challenge due to security issues with PR from forks. If PR are from same repo (not) forks then PR and ISSUE commenting it's possible.
PS2: Checking links to GITHUB may raise and issue with the limit rate of GITHUB HTTP requests
PS: annotation of the PR with a comment with the link check report it's a challenge due to security issues with PR from forks. If PR are from same repo (not) forks then PR and ISSUE commenting it's possible.
PS2: Checking links to GITHUB may raise and issue with the limit rate of GITHUB HTTP requests
@larsbarring https://github.com/cf-convention/cf-convention.github.io/pull/486#issuecomment-2075610871
Hi Antonio,
- the change at
vocabularies.md
has been tricky because themd
file uses HTML tags.
I was recently adding some minor changes to this file, and noticed that there is actually very little markdown, and a lot of repetitive html links. I though that it maybe would be possible to generate this file dynamically during the build process. Something like a small [python] script looking for through the relevant ../Data/
directories for which versions exists and then assembles the file based on that and md
text fragments, either read from file(s) or stored within the script. Could this be something to look further into (I'm afraid it's beyond my skill set)?
@cofinoa https://github.com/cf-convention/cf-convention.github.io/pull/486#issuecomment-2076073653
@larsbarring I have made a new PR at #487 with your suggestion to refactor vocabularies.md
Thanks for https://github.com/cf-convention/cf-convention.github.io/pull/487, Antonio. I don't fully understand this. Is this problem to the link-checker caused by a link to a markdown page from HTML, which is itself wrapped up as a markdown page? This seems rather convoluted. If the whole page is put in markdown instead, does that resolve it?
From @cofinoa https://github.com/cf-convention/cf-convention.github.io/pull/487
@larsbarring, as you have suggested at PR #486 , I have refactored vocabularies.md
to actual Markdown, instead HTML, and also I have generated automatically some of the lists.
Jekyll, it's quite limited to manage data and/or strings, and it's needed to create a Jekyll plugin to improve that, but my Ruby skills are also quite limited.
From @larsbarring https://github.com/cf-convention/cf-convention.github.io/pull/487#issuecomment-2076711737
The lists of links will be automatically generated from existing version subdirectories under the ../Data/
directory I think that the workflow will be simplified for publishing a new version of the Standard Name Table, Area Type Table (the versions of the Standardized Regions List is hardcoded?).
Hence pinging @japamment, @efisher008
@JonathanGregory
Is this problem to the link-checker caused by a link to a markdown page from HTML, which is itself wrapped up as a markdown page?
It's a problem to link a .HTML page which is build from a .MD page. But, because the content it's HTML, we can no link to the .MD page, as we are doing in other .MD pages, with MD content.
I have rewritten the HTML content of vocabulary.md
to MD content in this commit f7f145afd4f236539e67e2c661ffce38594f285a, along with commit 468decec7eac95081bb02b61f3ce3d3447bbe942 which fixes links to .HTML pages which are .MD pages.
@larsbarring
The lists of links will be automatically generated from existing version subdirectories under the ../Data/ directory I think that the workflow will be simplified for publishing a new version of the Standard Name Table, Area Type Table (the versions of the Standardized Regions List is hardcoded?).
I have closed PR #487 because automatic generation of links to ./Data
content has some pitfalls that need to be overcome, refactoring some directories in Data/
. For example, the directory with v84 of standard-names, has also a current
directory, and I think this is a BUG, that needs to be fixed.
@JonathanGregory
As we have probably fixed all the recurrent broken links on the website, we don't need to disable the link-checker, as discussed in this issue. I will therefore close this issue, and we will see what the link-checker has to say when it next executes.
There are some temporary issues with some links, for example I have had to exclude: https://mmisw.org/ont
in lychee.toml
configuration file (see 65da6b29fc8e01a8435e73b1df174b71cd34f786)
@cofinoa OK, I agree that having a closer look at the directory structure under ./Data
is worthwhile. I came to the same conclusion when working on the old versions of the standard name table. It also have bearing on this discussion. So, let's come back to this idea in a while.
There was no new broken links report this morning, I am very pleased to see. Thanks for suppressing it, @cofinoa! Shall we close this issue, or is it still a work in progress?
@JonathanGregory, It's still in progress.
If, it's OK, I would like to merge PR #486, which it's an intermediate step, before to solve this issue.
That's fine. Let's leave it open then. Thanks.
I have created 2 workflows/actions:
check_jekyll_build.yml
: An action with 2 main jobs triggered when a PR it's created :
A. to check links in Markdown files (./**/*.md
)
B. to check that Jekyll can build the Website
check_links_cron.yml
: The other action that runs on Mondays, and has also 2 main jobs:
C. to check that Jekyll can build the Website
D. to check links on the site built on job C, and if it fails a new issue, it's open: #490The exclusion rules are at .lychee/config.toml
which are being used for both actions (1 and 2), but we can create different ones for each action, in case it's needed.
Currently, I have excluded the following URL:
exclude = [
# Data/cf-standard-names/
"http://glossary.ametsoc.org/wiki",
"https://www.unidata.ucar.edu/software/udunits/udunits-current/doc/udunits",
"https://www.unidata.ucar.edu/software/udunits/udunits-2.2.28/udunits2.html",
"https://www.sciencedirect.com/science/article/pii/0967063793901018",
"https://www.ipcc.ch/ipccreports/tar/wg1/273.htm",
"http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata",
"http://gcmd.nasa.gov/Resources/valids",
#
"cfeditor.ceda.ac.uk", # standard_name_rule, vocabularies, discussion
"https://mailman.cgd.ucar.edu/pipermail/cf-metadata", # discussion, governance
"http://mmisw.org/ont", # faq (TIMEOUT)
"https://mmisw.org/ont", # faq (TIMEOUT)
"http://www.cgd.ucar.edu/cms/eaton/cf-metadata/clivar_article.pdf", # Data/cf-documents/cf-governance/cf2_whitepaper_final.html
"http://www.cgd.ucar.edu/cms/eaton/cf-metadata/CF-current.html", # Data/cf-documents/requirements-recommendations
"https://www.usbr.gov/lc/socal/reports/SMappend_C.pdf", # Data/area-type-table/**/build/area-type-table.html
"https://cf-trac.llnl.gov/trac/", # 2018-Workshop, 2019-Workshop
"http://mailman.cgd.ucar.edu/pipermail/cf-metadata", # 2019-Workshop
"https://www.wonder.me", # 2021-Workshop
"https://figshare.com/account/articles/24633939", # 2023-Workshop
"https://figshare.com/account/articles/24633894", # 2023-Workshop
]
Some of the excluded URL, are spurious broken links, which are temporarily broken.
Other, are permanently broken, and we need to decided what to do [1].
Also, I have excluded to check some paths, mainly because they contain some documents with invalid encoding, or many broken relatives links (i.e. Trac-tickets):
exclude_path = [
"_site/Data/cf-standard-names/docs/guidelines.html",
"_site/Data/cf-conventions/",
"_site/Data/Trac-tickets/",
]
regards
[1] For example, for the https://www.ipcc.ch/ipccreports/tar/wg1/273.htm
we could link to a capture from the Wayback Machine: https://web.archive.org/web/20181104000136/http://www.ipcc.ch/ipccreports/tar/wg1/273.htm
@JonathanGregory
I have improved the weekly cron workflow for the link checker (check_links_cron.yml
) and now, if a ISSUE with labels:
defect
, link-checker
, report
, automated issue
and status open
, then the action will insert a new comment. If no ISSUE it's open
with those labels, a new ISSUE it's been opened.
You can see a sample at issue #493
That's a very useful improvement. Thanks, Antonio.
@JonathanGregory et al. the issue #493 with broken link report has been updated, and new comment has been added to the issue for today's checker cron job: https://github.com/cf-convention/cf-convention.github.io/issues/493#issuecomment-2106970778
I have re-run the checker manually and "new" error appear, and the others disappear. The issue has been updated with the report for this "manual" check: https://github.com/cf-convention/cf-convention.github.io/issues/493#issuecomment-2107398673
IMO, there are 2 pending actions that we need to discuss:
It maybe would be useful to add this to the next meeting for the Information Management Team @cf-convention/info-mgmt
I'm closing this to continue discussion at https://github.com/orgs/cf-convention/discussions/320
This issue was opened automatically but led to a discussion by humans
Errors were reported while checking the availability of links: Issues found in 6 inputs. Find details below.
[faq.md]: ✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2009/047768.html | Failed: Network error: Not Found ⧖ [TIMEOUT] http://coastwatch.pfeg.noaa.gov/erddap/convert/units.html | Timeout ✗ [ERR] http://kitt.llnl.gov/trac/wiki/SatelliteData | Failed: Network error: dns error: no record found for name: kitt.llnl.gov.coi3uxiffnlergb4vem53tdisf.gx.internal.cloudapp.net. type: AAAA class: IN ✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2012/055875.html | Failed: Network error: Not Found ✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2010/053657.html | Failed: Network error: Not Found ✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2008/052705.html | Failed: Network error: Not Found ✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2010/048064.html | Failed: Network error: Not Found ✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2008/052334.html | Failed: Network error: Not Found
[standard_name_rules.md]: ⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=active&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout
[discussion.md]: ⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=active&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout ⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=inactive&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout
[software.md]: ✗ [ERR] http://wps-web1.ceda.ac.uk/submit/form?proc_id=CFChecker | Failed: Network error: dns error: no record found for name: wps-web1.ceda.ac.uk.coi3uxiffnlergb4vem53tdisf.gx.internal.cloudapp.net. type: AAAA class: IN
[vocabularies.md]: ⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=inactive&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout ⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=active&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout
[constitution.md]: ✗ [ERR] file:///github/workspace/(https:/github.com/cf-convention/cf-conventions/blob/master/CODE_OF_CONDUCT.md) | Failed: Cannot find file
🔍 350 Total ✅ 335 OK 🚫 9 Errors (HTTP:9|Timeouts:6)