NICMx / FORT-validator

RPKI cache validator
MIT License
49 stars 24 forks source link

Separate validation logs from operation logs #31

Closed ydahhrk closed 4 years ago

ydahhrk commented 4 years ago

Some people have manifested annoyance over the large volume of complaints that Fort logs due to issues found in the RPKI tree:

(sample excerpt)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2694/rXp9fDjcCAhxxza7mMddsMJ8uMc.roa: Serial number '661' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2694/5Rs-Pc07Y_zNasRoTeJVfORvmOE.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/497/cOev3xoFWdNyUTYsMDI5Egl8PQ8.roa: Serial number '6BD' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/497/4BqXb84YtfqpHX1W1oGcKGi_jYs.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/561/R_8KCa9pc3fNftjCdSYROwMQB3c.roa: Serial number '6B9' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/561/GqEqV9qIXXJK2ICJP14IjGwoQKQ.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/561/URCkLU-hrbETGHt-zWzU-tZvi70.roa: Serial number '6B9' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/561/GqEqV9qIXXJK2ICJP14IjGwoQKQ.roa'.)
ERR: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/FE-4PMY9qqTI2aJ0xLDm7cD-fvw.cer: Could not open location 'tmp/rpki.cnnic.cn/rpki/A9162E3D0000/515/FE-4PMY9qqTI2aJ0xLDm7cD-fvw.mft'
ERR:   - No such file or directory
ERR: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/FE-4PMY9qqTI2aJ0xLDm7cD-fvw.cer: Could not open location 'tmp/rpki.cnnic.cn/rpki/A9162E3D0000/515/FE-4PMY9qqTI2aJ0xLDm7cD-fvw.mft'
ERR:   - No such file or directory
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/01jyjWbE_vnmei3aHGlV_cCSXYA.roa: Serial number '732' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/-w8V5CMi6bHN88xgo8K9tma5MC8.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/70nRMOo1VzrcvxbocYKqzlcNbqU.roa: Serial number '732' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/-w8V5CMi6bHN88xgo8K9tma5MC8.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/9YAUV53TP8C8valffV_Pbl8k0Pc.roa: Serial number '732' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/-w8V5CMi6bHN88xgo8K9tma5MC8.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/Hi7elxud7qvaNilveqkgB0LmoHs.roa: Serial number '732' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/-w8V5CMi6bHN88xgo8K9tma5MC8.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/U2DxoZ7UrM_rju6WOJe1r5LpBgM.roa: Serial number '732' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/-w8V5CMi6bHN88xgo8K9tma5MC8.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/ZcvvMu_hxh0SyftR2-OPegdnTHA.roa: Serial number '732' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/-w8V5CMi6bHN88xgo8K9tma5MC8.roa'.)
WRN: rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/yWD9OmpNgJA5rwfOF9JlXPfgHLw.roa: Serial number '732' is not unique. (Also found in 'rsync://rpki.cnnic.cn/rpki/A9162E3D0000/2329/-w8V5CMi6bHN88xgo8K9tma5MC8.roa'.)

The admin running this is unlikely to have the authority to correct these problems, and they tend to obfuscate the actual operation errors, which are much more important for them.

On the other hand, downgrading the priority of these messages to debug would be a bit extreme because they can be helpful for people trying to validate the compliance of their own RPKI objects.

It has been therefore proposed to separate these messages into a separate logging channel.

Update: This bug is independent of incidences:

ydahhrk commented 4 years ago

First draft by @pcarana (translated by me):

Requirements

Log types

Application/execution/operation logs

Validation logs

Proposal

Current:

{
    "log": {
        "color-output": true | false,
        "file-name-format": "global-url" | "local-path" | "file-name",
        "output": "console" | "syslog",
        "level": "error" | "warning" | "info" | "debug"
    }
}

Proposal:

{
    "log": {
        "enable": true | false,
        "color-output": true | false,
        "file-name-format": "global-url" | "local-path" | "file-name",
        "output": "console" | "syslog",
        "level": "error" | "warning" | "info" | "debug"
    },
    "validation-log": {
        "enable": true | false,
        "color-output": true | false,
        "file-name-format": "global-url" | "local-path" | "file-name",
        "output": "console" | "syslog",
        "level": "error" | "warning" | "info" | "debug"
    }
}

Definitions:

Default values:

ydahhrk commented 4 years ago

Communication (rsync/http) infos/errors

I think these should be moved to the validation logs section.

Proposal:

Consider adding a prefix option to each log set so the prefix is not hardcoded:

{
    "log": {
        "enable": true | false,
        (...)
        "prefix": "[Operation log] "
    },
    "validation-log": {
        "enable": true | false,
        (...)
        "prefix": "[Validation log] "
    }
}

These should probably default to empty strings.

pcarana commented 4 years ago

I think these should be moved to the validation logs section.

Maybe, but there's one reason that holds me for doing this: what if the "communication info/error" is the host's responsibility? e.g. the host where fort is running it's having network issues, or has a firewall that's blocking rsync/https. In that case, I believe it's important to notify the operator.

As of today, there are common errors at the global RPKI servers, so yes, this logs will be present at the operation logs even when this isn't the host fault; but I think that's better to notify on any error of this kind at such logs which will be enabled by default, rather than sending them to the validation logs (that will be disabled by default).

Consider adding a prefix option to each log set so the prefix is not hardcoded: {...} These should probably default to empty strings.

I like this, let the prefix to be configured rather than hardcoded. Using empty strings as default: I'm not so sure; I would prefer to use the default values just as you wrote: log.prefix="[Operation log]" and validation-log="[Validation log]".

What do you think regarding this comments?

ydahhrk commented 4 years ago

Maybe, but there's one reason that holds me for doing this: what if the "communication info/error" is the host's responsibility? e.g. the host where fort is running it's having network issues, or has a firewall that's blocking rsync/https. In that case, I believe it's important to notify the operator.

As of today, there are common errors at the global RPKI servers, so yes, this logs will be present at the operation logs even when this isn't the host fault; but I think that's better to notify on any error of this kind at such logs which will be enabled by default, rather than sending them to the validation logs (that will be disabled by default).

Each TAL maps to a very small amounts of servers.

For example, a standalone run using Lacnic's TAL currently downloads files from repository.lacnic.net and rpki-repo.registro.br, and nowhere else. A standalone run using Afrinic's TAL downloads files from rpki.afrinic.net, and nowhere else.

Therefore, it doesn't make sense to flood the operation logs with messages like

Hey, I couldn't download repository.lacnic.net/cert3.cer
Hey, I couldn't download repository.lacnic.net/cert54.cer
Hey, I couldn't download repository.lacnic.net/cert129.cer
Hey, I couldn't download repository.lacnic.net/cert11.cer

Those belong to the validation logs. The operation logs should at most have one message per iteration like

Hey, I couldn't download 4 files from repository.lacnic.net

But I'd say that, as long as you could download at least one file from a server, then you can tell that any problems with that server are unlikely (impossible?) to be the host's responsibility.

Personally, I would only notify the admin if the entire server is unavailable.

Using empty strings as default: I'm not so sure; I would prefer to use the default values just as you wrote: log.prefix="[Operation log]" and validation-log="[Validation log]".

The validation log is disabled by default. That means that there is no need for a prefix by default.

As a compromise, how about

pcarana commented 4 years ago

Personally, I would only notify the admin if the entire server is unavailable.

This seems the way, there's no need to flood the operation logs with individual messages. The operation logs will have one general message related to a communication error only if such error persists.

To achieve that, we could use a new argument to log this error (communication error) at the operation log only if the error has persisted after some time (e.g. if after 4 hours the download isn't successful). On the meanwhile, after that time hasn't passed yet, send this log the validation logs.

There's only one exception to consider: the first validation run: if there's an error during this run, log it as a warning at the operation logs.

The validation log is disabled by default. That means that there is no need for a prefix by default.

As a compromise, how about

  • Default operation prefix: ""
  • Default validation log prefix: "[Validation] "

True, I like your proposal. Let's do it that way

pcarana commented 4 years ago

Add also the log.facility and validation-log.facility properties (see comments at #26).

Iqbalmahmood99 commented 2 years ago

Can anyone help with the query: from which location of the server this log files can be viewed?

ydahhrk commented 2 years ago

Logs are sent to syslog. By default, my Ubuntu places them in /var/log/syslog.

Iqbalmahmood99 commented 2 years ago

I have not found anything on that location. Console is set for the logs. In that case, where to view the logs?

Thanks.

On Tue, Jun 28, 2022, 22:08 Alberto Leiva Popper @.***> wrote:

Logs are sent to syslog. By default, my Ubuntu places them in /var/log/syslog.

— Reply to this email directly, view it on GitHub https://github.com/NICMx/FORT-validator/issues/31#issuecomment-1168919650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZELP6C3DEXL6FZL2RSGKL3VRMPRJANCNFSM4MLCBTXQ . You are receiving this because you commented.Message ID: @.***>

ydahhrk commented 2 years ago

AFAIK, If log.output and validation-log.output are console and you're running Fort as a service, then the logs are getting lost. Either don't run it as a service, or change the outputs to syslog.

If it's still not writing to /var/log/syslog, find the logs with

cd /var/log
sudo grep fort . -r