Open coke opened 1 year ago
The current deployment artifact is a container with nginx. So let's assume that is the parsing target.
The deployment environment is Portainer. It allows shared volumes between containers. So, one way to achieve this with minimal changes to our current deployment artifact:
Look into .htaccess support for existing mappings
I've fetched logs from the past 24 hours of production
journalctl -u raku-doc-website --since '24 hours ago' --no-pager > logs.txt
I will parse out the 404s. I'd paste them here, but I don't want to reveal any potential PII.
UPDATE: the logs are very truncated. I think this server's default journalctl configuration might be to aggressively limit the size of logs. Or it might be a podman thing. I'll keep looking.
Some of it is the usual randomness from the public internet, but there are legitimate clues to some missing stuff, too.
@finanalyst See the file I've linked in my previous comment for some counts of 404s per uri from production.
Our Caddy access logs give us json of the following form:
{"level":"info","ts":1677922396.2839906,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_ip":"REDACTED","remote_port":"8850","proto":"HTTP/1.1","method":"GET","host":"REDACTED","uri":"/","headers":{"Content-Length":["0"],"Connection":["close"],"User-Agent":["HCLB-HealthCheck"]}},"user_id":"","duration":0.000351099,"size":18097,"status":200,"resp_headers":{"Server":["Caddy"],"Etag":["\"rqxhwadyp\""],"Content-Type":["text/html; charset=utf-8"],"Last-Modified":["Fri, 03 Mar 2023 05:00:10 GMT"],"Accept-Ranges":["bytes"],"Content-Length":["18097"]}}
We can ask journalctl for just that JSON (omitting other journal metadata with --output cat
):
journalctl --output cat -u raku-doc-website > logs.txt
Then with jq and awk we can do the counting
cat logs.txt | jq -r '"\(.status)\t\(.request.uri)"' | \
awk '
/^404/ {hist[$2]++}
END {
for (item in hist) {
printf "%s\t-> %s\n", hist[item], item}
}
' > counts.txt
We could definitely use access to whatever is serving the content's web logs so we can (at least) track any 404 requests, which probably indicate a rename or gap not addressed by the .htaccess mappings (or equivalent)
See also #104 #164 #181