Closed avermeer-tc closed 10 months ago
Are you, by any chance, using the argument --output.roa
?
I don't believe so:
fort 1027306 11.0 8.5 2253252 343856 ? Ssl Jan26 623:54 /usr/bin/fort --configuration-file /etc/fort/config.json
{
"tal": "/etc/fort/tal",
"local-repository": "/var/lib/fort",
"slurm": "/etc/fort/slurm/",
"server": {
"port": "323"
},
"log": {
"output": "syslog"
},
"thread-pool": {
"server": {
"max": 45
},
"validation": {
"max": 5
}
}
}
Does this happen reliably?
And if so, may I have access to an environment where it happens? (I know this is a stretch)
Hi @ydahhrk. It happens sporadically. Do you want me to compile a version and run it from the debug branch issue83? I can't give you access to our production machines, but I am not opposed to running with the extra debug patches.
Do you want me to compile a version and run it from the debug branch issue83?
Sure, it would help. The only reason why I haven't asked is because I'm still trying to come up with ways to improve the output. But even as it currently is, there's a chance it'll improve the situation.
Though at this point I'm desperate enough to merge the debugging messages into the main branch. Put them on a release, even. Considering this, my failed attempt to consummate version 1.5.4 back in December might have turned out to be a blessing in disguise.
Another thing: Is your OS generating core dumps? If you have one from a crash, may I have it?
I didn't have core dumps enabled at the time of the crashes. I have enabled them now.
Thank you.
FYI, I finally managed to consummate version 1.5.4, 12 hours ago. In case you want to update.
(It contains the debugging messages.)
As stated in the 1.6.0 release notes, I have found and patched several instances of undefined behavior during the reviews. There is no way to prove that these caused this particular crash (particularly considering that I never managed to reproduce it, and the OP already probably left), but the code has changed so much, at this point I expect the bug to manifest in a completely different way, if at all.
Please upgrade to the latest version. If it crashes again, please open a new bug.
I've been running the patched version since Feb. I have not encountered the issue since then.
We are running the latest fort from the master branch. Today fort crashed with the following stack trace:
A couple hours later on our different node: