allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.6k stars 1.11k forks source link

What does %H parse? #2705

Open remino opened 3 months ago

remino commented 3 months ago

I have a standard NCSA log format, to which every line has two extra fields: the scheme (http or https), and the virtual host name.

So I’m specifying a custom format:

goaccess \
     --date-format "%d/%b/%Y" \
     --time-format "%H:%M:%S" \
     --log-format "%h %^[%d:%t %^] \"%r\" %s %b \"%R\" \"%u\" %H %v” \
     ...

According to the manpage, that should be correct:

       %v     The canonical Server Name of the server serving the request (Virtual Host).
...
       %H     The request protocol.

Now in my generated reports, I see virtual hostnames, but they are formatted like this:

ttp example.com
ttps example.com
ttp example.net
ttps example.net

Am I using %H correctly? If not, can the manpage elaborate what it means by “The request protocol” by citing a few examples?

0bi-w6n-K3nobi commented 2 months ago

Hi @remino

If you get a line from some Apache LOG or NGinX LOG, you will see something similar to:

1.2.3.4 - - [21/Sep/2024:18:09:08 +0000] "GET /Something-is-Here.htmlHTTP/1.1"

So, the keyword HTTP/1.1 is "request protocol" identified with %H. At Manual in Specifiers Section you can read more about this here. In truly, nowadays, it is have possible values like: HTTP/1.0, HTTP/1.1 or HTTP/2.0. What means to: very old protocol 1.0, http/https 1.1 and http/https 2. And so, probability it will have http/3.0 for new one!