NICMx / FORT-validator

RPKI cache validator
MIT License
49 stars 24 forks source link

Use previous valid SLURM configuration on SLURM error #9

Closed ydahhrk closed 4 years ago

ydahhrk commented 5 years ago

SLURM documentation:

Fort reloads the SLURM files during every validation cycle. If the new configuration is invalid, it is treated as nonexistent. Note that this means that an isolated mistake will temporarily drop all your SLURM overrides. This is intended to change in a future revision of Fort, in which the validator will fall back to the previous valid SLURM configuration on error.

To propose any other solutions, comment below please.

pcarana commented 4 years ago

Will be attended at the next release.

pcarana commented 4 years ago

Now under development ee5123439b34f91144017706aad308c3b33d511c.

pcarana commented 4 years ago

Here's how the solution was implemented:

ydahhrk commented 4 years ago

Whenever there's another error (e.g. file couldn't be read, or no files at directory), log that the whole SLURM is being discarded.

Could you please justify this?

If the configuration says that there should be a file there, but there isn't, shouldn't this be considered a problem? What about other I/O exceptions? For example, what if the file is supposed to be accessed through a network, but it's disconnected?

If the SLURM configuration is meant to be emptied, can't the user simply supply an empty file?

pcarana commented 4 years ago

Sure. If the validator already has a valid SLURM, it means that at least during one validation run the SLURM file(s) was ok (the file or files existed, could be read, and had a valid syntax). So, if there's an error during the next run (not a syntax error, since this kind of error means that the file can be accessed) this means that something has changed at the file or files (permissions, file deleted, etc) probably due to a user action; in that case, the validator is more strict, discarding all so that the user do something to fix the issue.

If the SLURM configuration is meant to be emptied, can't the user simply supply an empty file?

Not really, since an empty file can't be considered as an empty SLURM; currently, an empty file is treated as invalid, discarding the whole SLURM (if there was already a valid SLURM). There's an example at RFC 6418 section 3.2:

The following JSON structure with JSON members represents a SLURM file that has no filters or assertions:

{
  "slurmVersion": 1,
  "validationOutputFilters": {
    "prefixFilters": [],
    "bgpsecFilters": []
  },
  "locallyAddedAssertions": {
    "prefixAssertions": [],
    "bgpsecAssertions": []
  }
}
ydahhrk commented 4 years ago
{
  "slurmVersion": 1,
  "validationOutputFilters": {
    "prefixFilters": [],
    "bgpsecFilters": []
  },
  "locallyAddedAssertions": {
    "prefixAssertions": [],
    "bgpsecAssertions": []
  }
}

This is what I mean by an "empty" file though. This file clearly states that the SLURM configuration should be emptied. An inaccessible SLURM file does not.

(permissions, file deleted, etc) probably due to a user action; in that case, the validator is more strict, discarding all so that the user do something to fix the issue.

I think you're contradicting yourself. "Probably" means that you're actually unsure whether the file inaccessibility is intentional or not. A "strict" behavior would treat the anomaly as an error (which means do the same that happens when the file syntax is wrong), instead of treating it as valid empty configuration and create even more problems.

Telling the user "Hey, I can't find the file that's supposed to be here. I'm going to assume this is an error" is much more helpful than "Hey. 51721 routes are now invalid" (as a consequence of the stale but valid SLURM configuration being gone.)

pcarana commented 4 years ago

A "strict" behavior would treat the anomaly as an error (which means do the same that happens when the file syntax is wrong), instead of treating it as valid empty configuration and create even more problems.

Right, I didn't saw it that way, thanks for point it out. I will bounce this idea with the rest of the team, just to get a couple of opinions. The most likely thing to happen, is that the behaviour will be "strict" on any error, notify the user about it, and apply the last valid SLURM (if there's one already).

Thank you for the comments :+1:

pcarana commented 4 years ago

the behaviour will be "strict" on any error, notify the user about it, and apply the last valid SLURM (if there's one already).

Done (f39d02eca4733115cf31bdd20f8b7e4d01b8dff7), delivered to QA team.

ydahhrk commented 4 years ago

Thanks!

pcarana commented 4 years ago

Released with v1.2.0 :+1: