Closed ggidofalvy-tc closed 1 year ago
Hmmm. I'm not sure if I should continue looking.
The patch is... some might say rather drastic. I haven't really found a means to trigger the bug, but I realized I chopped all the corresponding code in the refactor branch a while ago, because it turns out the relevant validation doesn't actually need to be performed. Ever.
So I simply mirrored that patch in master.
There's a good chance the bug went down with the rest of the code, but I can't tell for sure.
I take it the error is very hard to reproduce? How long have you been running those instances?
Actually, that FORT instance was restarted just a few days before. If I recall correctly, the overall uptime was around 5-6ish days. I'm not sure what route it was validating/request it was serving, though.
Actually, that FORT instance was restarted just a few days before. If I recall correctly, the overall uptime was around 5-6ish days. I'm not sure what route it was validating/request it was serving, though.
Sure, but I mean, you were running the same code beforehand, right? And nobody else has reported this. So it's not trivial to reproduce.
Ok, review over. I'm sorry; I don't think I can do much more without a new stack trace.
The bad pointer is either cursor->name
or subject
.
subject
was dereferenced shortly before the incident (so it can't be NULL
), and cursor->name
is initialized through subject
. Rather than a null pointer, this looks like memory corruption. There's a rather reckless dereference chain nearby which might have lead the surrounding code to undefined behavior. Unfortunately, there's no way to tell for sure if this is really what happened.
All this code was deleted in 298a534652b695a3f443438149a4280514409b82. For now, I think we'll have to settle with this solution.
Do you folks plan to carve a new release containing this fix any time soon?
I can allocate time either today or tomorrow, but if possible, I'd like to receive some external feedback beforehand.
Can you confirm whether the new code still works properly for you?
I can definitely try testing it out tomorrow! Tip of the issue86 branch, correct?
Yes
I compiled a version of FORT based on the issue86 branch, and haven't had issues running it in our lab (two lab routers with full tables act as clients to a FORT and a gortr-based RPKI validator).
I'll let you know how this commit version acts over the weekend next week.
Ok
Just to clarify: I'm going to be traveling next week, so I will not actually be able to release next Monday.
In any case, I already pushed the patch to the main branch.
Assuming no problems are found, Fort 1.5.4 is now expected to release on December 12.
Sorry. As it has been a while, I've run into some bumps in the way trying to release version 1.5.4. Rusty scripts, rusty VM, #87, new Debian standards version, trip fatigue, December crunch...
I need to get other projects finished by Friday, so I think I'm going to have to postpone the release until next week.
I know it's a pain, but I don't want to half-ass it either. Please hold.
No rush!
We've been running this Fort commit in our lab and so far it's been working fine.
I was also hit by this bug on 1.5.3
fort.service: Failed with result 'core-dump'. fort.service: Main process exited, code=dumped, status=11/SEGV
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f1e40312133] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f1e403ed609] /usr/bin/fort(+0x435d9)[0x55ec846165d9] /usr/bin/fort(+0x34a11)[0x55ec84607a11] /usr/bin/fort(+0x33f6b)[0x55ec84606f6b] /usr/bin/fort(certificate_traverse+0xa9a)[0x55ec84604faa] /usr/bin/fort(rpp_traverse+0x48)[0x55ec845f9608] /usr/bin/fort(roa_traverse+0x161)[0x55ec84606311] /usr/bin/fort(signed_object_validate+0xec)[0x55ec84606bcc] /usr/bin/fort(signed_data_validate+0x762)[0x55ec845fc892] /usr/bin/fort(certificate_validate_rfc6487+0x1b9)[0x55ec846037a9] /usr/bin/fort(x509stack_store_subject+0x5c)[0x55ec845f121c] /usr/bin/fort(x509_name_equals+0xf)[0x55ec84605eef] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f1e403f9420] /usr/bin/fort(+0x2174e)[0x55ec845f474e]
I was finally able to finish the release of 1.5.4 yesterday. Feel free to reopen if the problem persists.
Hi folks,
I've just had a FORT validator crash due to a segmentation fault. Version 1.5.3.
Stack trace below. Sadly, I didn't have core dump generation enabled at the time of the crash, so I can't include one. Only one of two redundant nodes running FORT segfaulted.
There are no other relevant log lines in syslog before or after the stack trace.
If I had to guess, I think it's a null pointer dereference.