Closed dviererbe closed 1 month ago
I agree that it makes sense to handle that in the starter pack.
I'm a bit doubtful though if we can do anything except ignoring manpage.ubuntu.com. I just ran the check locally, and it didn't report the link broken. So either this is a problem that doesn't occur consistently, or it's something that only occurs on GitHub and not locally - either way, why would curl work better?
either way, why would curl work better?
To be honest, I don't know. I just tested it and it worked.
To be honest, I don't know. I just tested it and it worked.
On GitHub? And consistently? I'm not opposed to this solution, but it would be good to at least have a suspicion on why one works and the other doesn't.
note: I found out why curl works. For some reason the 404 does not throw an error.
$ curl https://manpages.ubuntu.com/manpages/en/man1/diff.1.html
<script language="JavaScript">
var q = location.href;
if (q.search(/\/manpages\/.*\/man[0-9]\/.*[^0-9]\.html$/) >= 0) {
// Current location matches a legacy link, with just a plain .html filename
// Try to redirect to the new location, which has a .[0-9].html filename
location.replace(q.replace(/\/man([0-9])\/(.*)\.html/, "/man$1/$2.$1.html"));
} else {
// Try redirecting to a page of search results
q = q.replace(/.*\//, "");
q = q.replace(/\.html$/, "");
location.replace('/cgi-bin/search.py?cx=003883529982892832976%3A5zl6o8w6f0s&cof=FORID%3A9&ie=UTF-8&titles=404&lr=lang_en&q=' + q);
}
</script>
<!doctype html>
<html lang="en" dir="ltr">
<head>
[...]
<title>Ubuntu Manpage:
Not Found
</title>
[...]
</head>
<body>
[...]
</body>
</html>
The manpages are a client site redirect by JavaScript. The tests started failing, because we recently switched to
https://manpages.ubuntu.com/manpages/en/man{section}/{page}.{section}.html
from something like
https://manpages.ubuntu.com/manpages/{dist}/en/man{section}/{page}.{section}.html
That sounds like the clean solution would be to update the links?
That sounds like the clean solution would be to update the links?
suggestion: Yes, we could do something like this in custom_conf.py
:
manpages_distribution = subprocess.check_output("distro-info --stable",
text=True, shell=True).strip()
manpages_url = ("https://manpages.ubuntu.com/manpages/"
f"{manpages_distribution}/en/"
"man{section}/{page}.{section}.html")
If there ever should be a need to parse the :manpage:
directives. I re-wrote the bash script as a python script, before I found out why curl was working :'(
import os
import sys
import re
import requests
sys.path.append('./')
from custom_conf import *
class ManPage:
def __init__(self, name: str, section: str) -> None:
self.Name = name
self.Section = section
self.FullName=f"{name}({section})"
def __str__(self) -> str:
return self.FullName
def __hash__(self) -> int:
return hash(self.FullName)
def __eq__(self, value: object) -> bool:
return isinstance(value, ManPage) and self.FullName == value.FullName
def __ne__(self, value: object) -> bool:
return not isinstance(value, ManPage) or self.Name != value.Name
def FindReStructuredTextFiles(root: str) -> list[str]:
rstFiles = []
directories = [ root ]
while len(directories) > 0:
directory = directories.pop()
entries = os.listdir(directory)
for entry in entries:
# ignore dotfiles (e.g. .sphinx)
if entry.startswith("."):
continue
fullPath = os.path.join(directory, entry)
if os.path.isfile(fullPath):
if entry.endswith('.rst'):
rstFiles.append(fullPath)
else:
directories.append(fullPath)
return rstFiles
def ParseUsedManpages(filePaths: list[str]) -> set[ManPage]:
manpages = set()
pattern = re.compile(
pattern=r':manpage:`(?P<page>[a-z0-9-.]+)\((?P<section>\w+)\)`',
flags=re.DOTALL | re.IGNORECASE)
for filePath in filePaths:
with open(filePath, 'r') as file:
for line in file:
matches = pattern.findall(line)
for match in matches:
manpage=ManPage(name=match[0], section=match[1])
manpages.add(manpage)
return manpages
def IsAccessible(url) -> tuple[bool, Exception | None]:
try:
response = requests.get(url, allow_redirects=True, timeout=5)
response.raise_for_status()
return (True, None)
except Exception as error:
return (False, error)
def VerifyThatAllManPagesAreAccessible() -> None:
docsFolder = os.path.join(os.getcwd(), "docs")
files = FindReStructuredTextFiles(docsFolder)
usedManpages = ParseUsedManpages(files)
allPagesAccessible = True
for manpage in usedManpages:
manpageUrl = manpages_url.format(page=manpage.Name,
section=manpage.Section)
isAccessible, error = IsAccessible(manpageUrl)
if isAccessible:
print(f"INFO: Fetching {manpage} (URL='{manpageUrl}') succeeded.")
else:
allPagesAccessible=False
print(f"ERROR: Fetching {manpage} (URL='{manpageUrl}') failed!")
print("DEBUG: ", error)
if allPagesAccessible:
print("All man pages are accessible :)")
else:
print("One ore more man pages are NOT accessible :'(")
sys.exit(1)
if __name__ == "__main__":
VerifyThatAllManPagesAreAccessible()
The linkchecks for manpage.ubuntu.com urls recently started failing for the Ubuntu Packaging Guide (see an example log)despite working fine on manual inspection in firefox and chrome.
@s-makin also mentioned that she had a similar experience with the Ubuntu Pro documentation. Thats why I escalate this to a general canonical sphinx documentation issue.
I propose that we configure linkcheck to ignore manpage.ubuntu.com urls and run a script that greps all :manpage:
value
occurrences to check for working manpages using curl.