NICMx / FORT-validator

RPKI cache validator
MIT License
47 stars 22 forks source link

Route validate as invalid using fort 1.5.3 while cloudflare/routinator validate unknown/valid #98

Open beego89 opened 12 months ago

beego89 commented 12 months ago

hi,

i'm having issue when using fort, route validate as invalid. fort record show different as whois database to.

any help?

tq

lukastribus commented 12 months ago

A rpki-validator like Fort or routinator does not decide whether a route is invalid or not (or unknown for that matter). A rpki-validator just provides the list of validated prefixes, it is the BGP router that decides whether a BGP prefix is unkown, valid or invalid, based on the validated prefixes of the rpki validator.

Which prefix are you having issue with exactly, what is the prefix in the BGP routing table that shows as invalid?

beego89 commented 12 months ago

total count for invalid route prefix more then 700. compare randomly as attach files:

Fort.xlsx

lukastribus commented 12 months ago

I checked the first 3, 4 prefixes in the excel file and Forts outputs matches those of other validators.

Perhaps you are running an old version of Fort or perhaps your Fort instance is serving stale data?

Did you check if you are running fort 1.5.4? When was the last time your running Fort instance completed a validation cycle?

beego89 commented 11 months ago

check on log have error "ERR: poll() returned revents 32."..can advice if there any mistake during installation or my configuration?

Fort Journalctl.txt

not yet running fort 1.5.4, will try update if there is no solution.

i have min config: { "tal": "/etc/fort/tal", "local-repository": "/var/lib/fort/repository", "slurm": "/etc/fort/slurm", "server": { "address": ["172.24.58.2"], "port": "8323" }, "log": { "output": "syslog" } }

tq

lukastribus commented 11 months ago

What release are you running currently? I would definitely suggest upgrading to latest.

ydahhrk commented 11 months ago

ERR: poll() returned revents 32.

Ok, this looks like a programming error. In my environment, "revents 32" is POLLNVAL, which means

Invalid request: fd not open

@beego89 Just to make sure: Can you please post the output of grep "#define\s\+POLL" /usr/include -r? I need to make sure if my 32 is the same as your 32.

It seems Fort is closing the File Descriptor before it's done sending the table to the router, although it's strange that the router is seemingly not dropping the table as a result. Will investigate.

beego89 commented 11 months ago

hi ydahhrk,

output as below:

[xxx@vkcprpkprdap200 ~]$ grep "#define\s+POLL" /usr/include -r /usr/include/asm-generic/poll.h:#define POLLIN 0x0001 /usr/include/asm-generic/poll.h:#define POLLPRI 0x0002 /usr/include/asm-generic/poll.h:#define POLLOUT 0x0004 /usr/include/asm-generic/poll.h:#define POLLERR 0x0008 /usr/include/asm-generic/poll.h:#define POLLHUP 0x0010 /usr/include/asm-generic/poll.h:#define POLLNVAL 0x0020 /usr/include/asm-generic/poll.h:#define POLLRDNORM 0x0040 /usr/include/asm-generic/poll.h:#define POLLRDBAND 0x0080 /usr/include/asm-generic/poll.h:#define POLLWRNORM 0x0100 /usr/include/asm-generic/poll.h:#define POLLWRBAND 0x0200 /usr/include/asm-generic/poll.h:#define POLLMSG 0x0400 /usr/include/asm-generic/poll.h:#define POLLREMOVE 0x1000 /usr/include/asm-generic/poll.h:#define POLLRDHUP 0x2000 /usr/include/asm-generic/poll.h:#define POLLFREE 0x4000 / currently only for epoll / /usr/include/asm-generic/poll.h:#define POLL_BUSY_LOOP 0x8000 /usr/include/asm-generic/siginfo.h:#define POLL_IN (SI_POLL|1) / data input available / /usr/include/asm-generic/siginfo.h:#define POLL_OUT (SI_POLL|2) / output buffers available / /usr/include/asm-generic/siginfo.h:#define POLL_MSG (SI_POLL|3) / input message available / /usr/include/asm-generic/siginfo.h:#define POLL_ERR (SI_POLL|4) / i/o error / /usr/include/asm-generic/siginfo.h:#define POLL_PRI (SI_POLL|5) / high priority input available / /usr/include/asm-generic/siginfo.h:#define POLL_HUP (SI_POLL|6) / device disconnected / /usr/include/bits/poll.h:#define POLLIN 0x001 / There is data to read. / /usr/include/bits/poll.h:#define POLLPRI 0x002 / There is urgent data to read. / /usr/include/bits/poll.h:#define POLLOUT 0x004 / Writing now will not block. / /usr/include/bits/poll.h:#define POLLERR 0x008 / Error condition. / /usr/include/bits/poll.h:#define POLLHUP 0x010 / Hung up. / /usr/include/bits/poll.h:#define POLLNVAL 0x020 / Invalid polling request. / [xxx@vkcprpkprdap200 ~]$

beego89 commented 11 months ago

@lukastribus sure..will proceed to upgrade 1.5.4 to see if problem resolve. tq

ydahhrk commented 11 months ago

Report:

Thanks for the output; we're in sync.

The error messages you're getting in the log (such as ERR: poll() returned revents 32) are all inoffensive. They simply mean some router happened to get disconnected in the middle of a data transfer. I have reduced their severities and improved the strings in the latest commit so they stop confusing people.

On a different note, it's weird that you're getting so many of those error messages, even if they're scattered through several days. Is there something in your network that might be disconnecting Fort and the Routers every now and then?

~I will start investigating the discrepancies with Routinator and Cloudflare now. If you can provide the list of ROAs that don't match, or at least a reasonable subset of them, I should be able to figure it out faster.~

ydahhrk commented 11 months ago

Ok, sorry for taking so long. I agree with everything @lukastribus has said.

@beego89 I just realized I might be lagging in understanding the problem. In the title of the issue, you say Fort yields "invalid," but all the seeming relevant records of your spreadsheet are marked "valid."

What's the problem?

beego89 commented 11 months ago

hi @ydahhrk ..thanks for checking on the error. we have multiple gateway upstream/peering GW router across region and maybe due to router faulty.

@lukastribus prefix sample in excel is "invalid" output from our GW router using fort 1.5.3. when do comparison with cloudflare and routinator, it yields different validation "unknown" and "valid". perhaps you can compare the output from your fort validator with my fort output in excell to see if it's the same.

tq

lukastribus commented 11 months ago

I did that when you posted the excel file, as explained in my answer to that.

Again, I suggest you check when Fort last completed a validation cycle. Also perhaps you want to enable validation logging and post the full output.