NICMx / FORT-validator

RPKI cache validator
MIT License
50 stars 24 forks source link

Segmentation fault in x509_name_equals #86

Closed ggidofalvy-tc closed 1 year ago

ggidofalvy-tc commented 1 year ago

Hi folks,

I've just had a FORT validator crash due to a segmentation fault. Version 1.5.3.

    Process: 37734 ExecStart=/usr/bin/fort --configuration-file /etc/fort/config.json (code=killed, signal=SEGV)

Stack trace below. Sadly, I didn't have core dump generation enabled at the time of the crash, so I can't include one. Only one of two redundant nodes running FORT segfaulted.

/usr/bin/fort(+0x2173f)[0x5593ff76073f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7ff0464e5140]
/usr/bin/fort(x509_name_equals+0xf)[0x5593ff77296f]
/usr/bin/fort(x509stack_store_subject+0x58)[0x5593ff75d2e8]
/usr/bin/fort(certificate_validate_rfc6487+0x1e0)[0x5593ff770150]
/usr/bin/fort(signed_data_validate+0x74f)[0x5593ff7688ff]
/usr/bin/fort(roa_traverse+0x161)[0x5593ff772df1]
/usr/bin/fort(rpp_traverse+0x48)[0x5593ff765788]
/usr/bin/fort(certificate_traverse+0xafa)[0x5593ff7719ea]
/usr/bin/fort(+0x349bb)[0x5593ff7739bb]
/usr/bin/fort(+0x353c8)[0x5593ff7743c8]
/usr/bin/fort(+0x44458)[0x5593ff783458]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7ff0464d9ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ff0463f9aef]

There are no other relevant log lines in syslog before or after the stack trace.

If I had to guess, I think it's a null pointer dereference.

ydahhrk commented 1 year ago

Hmmm. I'm not sure if I should continue looking.

The patch is... some might say rather drastic. I haven't really found a means to trigger the bug, but I realized I chopped all the corresponding code in the refactor branch a while ago, because it turns out the relevant validation doesn't actually need to be performed. Ever.

So I simply mirrored that patch in master.

There's a good chance the bug went down with the rest of the code, but I can't tell for sure.

I take it the error is very hard to reproduce? How long have you been running those instances?

ggidofalvy-tc commented 1 year ago

Actually, that FORT instance was restarted just a few days before. If I recall correctly, the overall uptime was around 5-6ish days. I'm not sure what route it was validating/request it was serving, though.

ydahhrk commented 1 year ago

Actually, that FORT instance was restarted just a few days before. If I recall correctly, the overall uptime was around 5-6ish days. I'm not sure what route it was validating/request it was serving, though.

Sure, but I mean, you were running the same code beforehand, right? And nobody else has reported this. So it's not trivial to reproduce.


Ok, review over. I'm sorry; I don't think I can do much more without a new stack trace.

The bad pointer is either cursor->name or subject.

subject was dereferenced shortly before the incident (so it can't be NULL), and cursor->name is initialized through subject. Rather than a null pointer, this looks like memory corruption. There's a rather reckless dereference chain nearby which might have lead the surrounding code to undefined behavior. Unfortunately, there's no way to tell for sure if this is really what happened.

All this code was deleted in 298a534652b695a3f443438149a4280514409b82. For now, I think we'll have to settle with this solution.

ggidofalvy-tc commented 1 year ago

Do you folks plan to carve a new release containing this fix any time soon?

ydahhrk commented 1 year ago

I can allocate time either today or tomorrow, but if possible, I'd like to receive some external feedback beforehand.

Can you confirm whether the new code still works properly for you?

ggidofalvy-tc commented 1 year ago

I can definitely try testing it out tomorrow! Tip of the issue86 branch, correct?

ydahhrk commented 1 year ago

Yes

ggidofalvy-tc commented 1 year ago

I compiled a version of FORT based on the issue86 branch, and haven't had issues running it in our lab (two lab routers with full tables act as clients to a FORT and a gortr-based RPKI validator).

I'll let you know how this commit version acts over the weekend next week.

ydahhrk commented 1 year ago

Ok

Just to clarify: I'm going to be traveling next week, so I will not actually be able to release next Monday.

In any case, I already pushed the patch to the main branch.

Assuming no problems are found, Fort 1.5.4 is now expected to release on December 12.

ydahhrk commented 1 year ago

Sorry. As it has been a while, I've run into some bumps in the way trying to release version 1.5.4. Rusty scripts, rusty VM, #87, new Debian standards version, trip fatigue, December crunch...

I need to get other projects finished by Friday, so I think I'm going to have to postpone the release until next week.

I know it's a pain, but I don't want to half-ass it either. Please hold.

ggidofalvy-tc commented 1 year ago

No rush!

We've been running this Fort commit in our lab and so far it's been working fine.

mzi77 commented 1 year ago

I was also hit by this bug on 1.5.3

fort.service: Failed with result 'core-dump'. fort.service: Main process exited, code=dumped, status=11/SEGV

/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f1e40312133] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f1e403ed609] /usr/bin/fort(+0x435d9)[0x55ec846165d9] /usr/bin/fort(+0x34a11)[0x55ec84607a11] /usr/bin/fort(+0x33f6b)[0x55ec84606f6b] /usr/bin/fort(certificate_traverse+0xa9a)[0x55ec84604faa] /usr/bin/fort(rpp_traverse+0x48)[0x55ec845f9608] /usr/bin/fort(roa_traverse+0x161)[0x55ec84606311] /usr/bin/fort(signed_object_validate+0xec)[0x55ec84606bcc] /usr/bin/fort(signed_data_validate+0x762)[0x55ec845fc892] /usr/bin/fort(certificate_validate_rfc6487+0x1b9)[0x55ec846037a9] /usr/bin/fort(x509stack_store_subject+0x5c)[0x55ec845f121c] /usr/bin/fort(x509_name_equals+0xf)[0x55ec84605eef] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f1e403f9420] /usr/bin/fort(+0x2174e)[0x55ec845f474e]

ydahhrk commented 1 year ago

I was finally able to finish the release of 1.5.4 yesterday. Feel free to reopen if the problem persists.