NICMx / FORT-validator

RPKI cache validator
MIT License
50 stars 24 forks source link

1.5.3 crashing repeatedly starting about a hour ago #65

Closed sbr2004 closed 10 months ago

sbr2004 commented 2 years ago

fort 1.5.3 (installed from fort-1.5.3-1.el8.x86_64.rpm)

Started to crash repeatedly beginning about a hour ago. Unable to get it working. Tried to wipe /var/fort_cache, did not help

Stack trace: /usr/local/bin/fort(print_stack_trace+0x1a) [0x415c6a] /usr/local/bin/fort(pr_crit+0x7f) [0x417b7f] /usr/local/bin/fort() [0x412c25] /usr/local/bin/fort(deferstack_pop+0x3b) [0x412e0b] /usr/local/bin/fort() [0x423b70] /lib64/libpthread.so.0(+0x7ea5) [0x7f715b7abea5] /lib64/libc.so.6(clone+0x6d) [0x7f715b4d48dd] (Stack size was 7.)

ydahhrk commented 2 years ago

Ok.

Quick question: Do you know if this also happens with 1.5.2?

sbr2004 commented 2 years ago

Sorry, cant check it.

sbr2004 commented 2 years ago

Same on another box, with 1.4.2:

Stack trace: /usr/local/bin/fort(print_stack_trace+0x1f) [0x55632b9e482f] /usr/local/bin/fort(pr_crit+0x89) [0x55632b9e76d9] /usr/local/bin/fort(+0x1d2b3) [0x55632b9e12b3] /usr/local/bin/fort(deferstack_pop+0x2f) [0x55632b9e14cf] /usr/local/bin/fort(+0x31fba) [0x55632b9f5fba] /usr/local/bin/fort(+0x329f1) [0x55632b9f69f1] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f8dac6a9fa3] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f8dac5da4cf] (Stack size was 8.)

fort -V fort 1.4.2

ydahhrk commented 2 years ago

Thank you.

sbr2004 commented 2 years ago

I have 3 boxes with fort. 1.5.3 and 1.4.2 crashed, seems only v1.5.1-10-gc69a140 is still working.

lukastribus commented 2 years ago

Not sure if there is a relation, but @job tweeted the following earlier today:

Today I published BGPsec Router Keys for AS 15562

https://twitter.com/JobSnijders/status/1458126836391686149

ties commented 2 years ago

Unfortunately I can not get fort to finish the initial sync when firewalling the hostname of the repo containing the P256/bgpsec certificate. This does however prevent the crash. Next step would be to serve another rsync repo at that IP and see what happens.

ydahhrk commented 2 years ago

Is the stack trace really the only output? It seems as though it should print some error message before that.

sbr2004 commented 2 years ago

Yes, it is. No error message.

sbr2004 commented 2 years ago

Started it with debug argument, last lines from syslog:

Nov 10 01:00:28 nuff fort: DBG: Client closed the socket. Nov 10 01:00:29 nuff fort: DBG: Thread Validation.1: Task '/var/tal/lacnic.tal' ended Nov 10 01:00:29 nuff fort: DBG: Thread Validation.1: Waiting for work... Nov 10 01:00:29 nuff fort: DBG: - Active workers: 0 Nov 10 01:00:29 nuff fort: DBG: - Task queue: Empty Nov 10 01:00:29 nuff fort: DBG: Pool 'Validation': All work has been completed. Nov 10 01:00:29 nuff fort: WRN: Validation from TAL '/var/tal/apnic.tal' yielded error, discarding any other validation results. Nov 10 01:00:29 nuff fort: INF: Validation finished: Nov 10 01:00:29 nuff fort: INF: - Valid ROAs: 0 Nov 10 01:00:29 nuff fort: INF: - Valid Router Keys: 0 Nov 10 01:00:29 nuff fort: INF: - No serial number. Nov 10 01:00:29 nuff fort: INF: - Real execution time: 331 secs. Nov 10 01:00:29 nuff fort: ERR: First validation wasn't successful. Nov 10 01:00:29 nuff fort: Stack trace: Nov 10 01:00:29 nuff fort: /usr/local/bin/fort(print_stack_trace+0x1f) [0x417e2f] Nov 10 01:00:29 nuff fort: /usr/local/bin/fort(pr_op_err+0x84) [0x4183f4] Nov 10 01:00:29 nuff fort: /usr/local/bin/fort(main+0x146) [0x413e26] Nov 10 01:00:29 nuff fort: /lib64/libc.so.6(libc_start_main+0xf5) [0x7f2e19722555] Nov 10 01:00:29 nuff fort: /usr/local/bin/fort() [0x413e65] Nov 10 01:00:29 nuff fort: (End of stack trace) Nov 10 01:00:29 nuff fort: DBG: Destroying thread pool 'Server'. Nov 10 01:00:29 nuff fort: DBG: Thread Server.1: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.2: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.3: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.4: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.5: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.6: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.7: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.8: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.10: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.9: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.11: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.12: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.13: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.14: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.15: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.16: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.17: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.18: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.19: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Server.20: Returning. Nov 10 01:00:29 nuff fort: DBG: Destroyed. Nov 10 01:00:29 nuff fort: DBG: Destroying thread pool 'Validation'. Nov 10 01:00:29 nuff fort: DBG: Thread Validation.4: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Validation.2: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Validation.3: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Validation.5: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Validation.1: Returning. Nov 10 01:00:29 nuff fort: DBG: Destroyed. Nov 10 01:00:29 nuff fort: DBG: Destroying thread pool 'Internal'. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.1: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.2: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.4: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.5: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.6: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.3: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.7: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.8: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.9: Returning. Nov 10 01:00:29 nuff fort: DBG: Thread Internal.10: Returning. Nov 10 01:00:29 nuff fort: DBG: Destroyed.

ydahhrk commented 2 years ago

That stack trace looks very different, though. (It's not a fatal one.)

ties commented 2 years ago

1.5.0:

sudo fort --tal /etc/pki/tals --log.level=debug --rsync.retry.count=1 --rrdp.retry.count=1 --rsync.retry.interval=1 --rrdp.retry.interval=1
Nov 10 00:00:44 WRN: 'rrdp.retry.count' will be deprecated, use 'http.retry.count' instead.
Nov 10 00:00:44 WRN: 'rrdp.retry.interval' will be deprecated, use 'http.retry.interval' instead.
Nov 10 00:00:44 INF: Disabling validation logging on syslog.
Nov 10 00:00:44 INF: Disabling validation logging on standard streams.
Nov 10 00:00:44 INF: Console log output configured; disabling operation logging on syslog.
Nov 10 00:00:44 INF: (Operation Logs will be sent to the standard streams only.)
Nov 10 00:00:44 INF: Configuration {
Nov 10 00:00:44 INF: tal: /etc/pki/tals
Nov 10 00:00:44 INF: local-repository: /tmp/fort/repository
Nov 10 00:00:44 INF: sync-strategy: root-except-ta
Nov 10 00:00:44 INF: shuffle-uris: false
Nov 10 00:00:44 INF: maximum-certificate-depth: 32
Nov 10 00:00:44 INF: slurm: (null)
Nov 10 00:00:44 INF: mode: server
Nov 10 00:00:44 INF: work-offline: false
Nov 10 00:00:44 INF: daemon: false
Nov 10 00:00:44 INF: server.address:
Nov 10 00:00:44 INF:   <Nothing>
Nov 10 00:00:44 INF: server.port: 323
Nov 10 00:00:44 INF: server.backlog: 128
Nov 10 00:00:44 INF: server.interval.validation: 3600
Nov 10 00:00:44 INF: server.interval.refresh: 3600
Nov 10 00:00:44 INF: server.interval.retry: 600
Nov 10 00:00:44 INF: server.interval.expire: 7200
Nov 10 00:00:44 INF: rsync.enabled: true
Nov 10 00:00:44 INF: rsync.priority: 50
Nov 10 00:00:44 INF: rsync.strategy: root-except-ta
Nov 10 00:00:44 INF: rsync.retry.count: 1
Nov 10 00:00:44 INF: rsync.retry.interval: 1
Nov 10 00:00:44 INF: rsync.program: rsync
Nov 10 00:00:44 INF: rsync.arguments-recursive:
Nov 10 00:00:44 INF:   --recursive
Nov 10 00:00:44 INF:   --delete
Nov 10 00:00:44 INF:   --times
Nov 10 00:00:44 INF:   --contimeout=20
Nov 10 00:00:44 INF:   --timeout=15
Nov 10 00:00:44 INF:   $REMOTE
Nov 10 00:00:44 INF:   $LOCAL
Nov 10 00:00:44 INF: rsync.arguments-flat:
Nov 10 00:00:44 INF:   --times
Nov 10 00:00:44 INF:   --contimeout=20
Nov 10 00:00:44 INF:   --timeout=15
Nov 10 00:00:44 INF:   --dirs
Nov 10 00:00:44 INF:   $REMOTE
Nov 10 00:00:44 INF:   $LOCAL
Nov 10 00:00:44 INF: rrdp.enabled: true
Nov 10 00:00:44 INF: rrdp.priority: 60
Nov 10 00:00:44 INF: rrdp.retry.count: 1
Nov 10 00:00:44 INF: rrdp.retry.interval: 1
Nov 10 00:00:44 INF: http.enabled: true
Nov 10 00:00:44 INF: http.priority: 60
Nov 10 00:00:44 INF: http.retry.count: 1
Nov 10 00:00:44 INF: http.retry.interval: 1
Nov 10 00:00:44 INF: http.user-agent: fort/1.5.0
Nov 10 00:00:44 INF: http.connect-timeout: 30
Nov 10 00:00:44 INF: http.transfer-timeout: 0
Nov 10 00:00:44 INF: http.idle-timeout: 15
Nov 10 00:00:44 INF: http.ca-path: (null)
Nov 10 00:00:44 INF: log.enabled: true
Nov 10 00:00:44 INF: log.output: console
Nov 10 00:00:44 INF: log.level: debug
Nov 10 00:00:44 INF: log.tag: (null)
Nov 10 00:00:44 INF: log.facility: daemon
Nov 10 00:00:44 INF: log.file-name-format: global-url
Nov 10 00:00:44 INF: log.color-output: false
Nov 10 00:00:44 INF: validation-log.enabled: false
Nov 10 00:00:44 INF: validation-log.output: console
Nov 10 00:00:44 INF: validation-log.level: warning
Nov 10 00:00:44 INF: validation-log.tag: Validation
Nov 10 00:00:44 INF: validation-log.facility: daemon
Nov 10 00:00:44 INF: validation-log.file-name-format: global-url
Nov 10 00:00:44 INF: validation-log.color-output: false
Nov 10 00:00:44 INF: Custom incidences:
Nov 10 00:00:44 INF:   incid-hashalg-has-params (Signed Object's hash algorithm has NULL object as parameters): ignore
Nov 10 00:00:44 INF:   incid-obj-not-der-encoded (Object isn't DER encoded): ignore
Nov 10 00:00:44 INF:   incid-file-at-mft-not-found (File listed at manifest doesn't exist): error
Nov 10 00:00:44 INF:   incid-file-at-mft-hash-not-match (File hash listed at manifest doesn't match the actual file hash): error
Nov 10 00:00:44 INF:   incid-mft-stale (The current time is after the nextUpdate field at the manifest): error
Nov 10 00:00:44 INF:   incid-crl-stale (The current time is after the nextUpdate field at the CRL): error
Nov 10 00:00:44 INF: output.roa: (null)
Nov 10 00:00:44 INF: output.bgpsec: (null)
Nov 10 00:00:44 INF: asn1-decode-max-stack: 4096
Nov 10 00:00:44 INF: stale-repository-period: 43200
Nov 10 00:00:44 INF: init-tals: false
Nov 10 00:00:44 INF: thread-pool.server.max: 20
Nov 10 00:00:44 INF: thread-pool.validation.max: 5
Nov 10 00:00:44 INF: }
Nov 10 00:00:44 DBG: rpkiManifest registered. Its nid is 1205.
Nov 10 00:00:44 DBG: signedObject registered. Its nid is 1206.
Nov 10 00:00:44 DBG: rpkiNotify registered. Its nid is 1207.
Nov 10 00:00:44 DBG: id-cp-ipAddr-asNumber (RFC 6484) registered. Its nid is 1208.
Nov 10 00:00:44 DBG: id-cp-ipAddr-asNumber-v2 (RFC 8360) registered. Its nid is 1209.
Nov 10 00:00:44 DBG: id-pe-ipAddrBlocks-v2 registered. Its nid is 1210.
Nov 10 00:00:44 DBG: id-pe-autonomousSysIds-v2 registered. Its nid is 1211.
Nov 10 00:00:44 DBG: id-kp-bgpsec-router registered. Its nid is 1212.
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #0 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #1 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #2 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #3 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #4 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #5 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #6 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #7 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #8 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #9 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #0 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #1 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #2 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #3 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #4 spawned
Nov 10 00:00:44 INF: Attempting to bind socket to address 'any', port '323'.
Nov 10 00:00:44 INF: Success; bound to address 'any', port '323'.
Nov 10 00:00:44 DBG: Created server socket with FD 3.
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #0 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #1 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #2 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #3 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #4 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #5 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #6 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #7 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #8 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #9 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #10 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #11 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #12 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #13 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #14 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #15 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #16 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #17 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #18 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #19 spawned
Nov 10 00:00:44 WRN: First validation cycle has begun, wait until the next notification to connect your router(s)
Nov 10 00:00:44 INF: Starting validation.
Nov 10 00:00:44 DBG: Pushing a task to the pool
Nov 10 00:00:44 DBG: Pushing a task to the pool
Nov 10 00:00:44 DBG: Pulling a task from the pool
Nov 10 00:00:44 DBG: Working on task #1
Nov 10 00:00:44 DBG: Pulling a task from the pool
Nov 10 00:00:44 DBG: Working on task #2
Nov 10 00:00:44 DBG: Pushing a task to the pool
Nov 10 00:00:44 DBG: Pulling a task from the pool
Nov 10 00:00:44 DBG: Working on task #3
Nov 10 00:00:44 DBG: Pushing a task to the pool
Nov 10 00:00:44 DBG: Waiting all tasks from the pool to end
Nov 10 00:00:44 DBG: - Stop: false
Nov 10 00:00:44 DBG: - Working count: 3
Nov 10 00:00:44 DBG: - Thread count: 5
Nov 10 00:00:44 DBG: - Empty queue: false
Nov 10 00:00:44 DBG: Pulling a task from the pool
Nov 10 00:00:44 DBG: Working on task #4
Nov 10 00:00:56 DBG: Task ended
Nov 10 00:00:56 DBG: Thread waiting for work...
Nov 10 00:01:16 DBG: Task ended
Nov 10 00:01:16 DBG: Thread waiting for work...
Nov 10 00:02:25 DBG: Task ended
Nov 10 00:02:25 DBG: Thread waiting for work...
Nov 10 00:03:12 CRT: /etc/pki/tals/ripe.tal: Attempted to pop empty X509 stack
Stack trace:
  fort(print_stack_trace+0x1a) [0x41894a]
  fort(pr_crit+0x89) [0x41b829]
  fort() [0x4152e1]
  fort(deferstack_pop+0x2f) [0x4154ef]
  fort() [0x42a004]
  fort() [0x42a9b1]
  fort() [0x439131]
  /lib64/libc.so.6(+0x8db17) [0x7f226b144b17]
  /lib64/libc.so.6(+0x1126c0) [0x7f226b1c96c0]
(Stack size was 9.)

1.5.3 passes on my machine (now). Of course I do not know if the same content was served to this version: A repository operator could serve different content depending on the user-agent etc.

Curiously, valid router keys is 0 while there is one in the repo:

Nov 10 00:05:57 INF: Validation finished:
Nov 10 00:05:57 INF: - Valid ROAs: 246846
Nov 10 00:05:57 INF: - Valid Router Keys: 0
Nov 10 00:05:57 INF: - Serial: 1
Nov 10 00:05:57 INF: - Real execution time: 134 secs.
Nov 10 00:05:57 WRN: First validation cycle successfully ended, now you can connect your router(s)
ydahhrk commented 2 years ago

Nov 10 00:03:12 CRT: /etc/pki/tals/ripe.tal: Attempted to pop empty X509 stack

Ok. We do have the error message, though. Thanks.

ydahhrk commented 2 years ago

Curiously, valid router keys is 0 while there is one in the repo:

Normal. It's supposed to skip BGP certificates, since issue58-proper hasn't been merged.

ties commented 2 years ago

If I run

$ find .   -name \*.cer   -exec sh -c 'openssl x509 -in $1 -noout -text -inform DER > $1.txt' x {} \;
$ rg P256

In the cache dir after running 1.5.3 I do not see the P256/bgpsec certificate. With 1.5.0 I do. My routinator also sees two valid bgpsec certificates. It's late in my local timezone so I will not dive deeper - but I would expect the files to still be in the repo?

Could be fort cleaning the directory. Could be a split view being served depending on the user-agent.

ydahhrk commented 2 years ago

With 1.5.0 I do.

Does anybody happen to still have a copy of the cache from when the crash happened?

ties commented 2 years ago
$ sudo fort --tal /etc/pki/tals --log.level=debug --rsync.retry.count=1 --rrdp.retry.count=1 --rsync.retry.interval=1 --rrdp.retry.interval=1
...
Nov 21 17:41:24 DBG: Thread Validation.4: Claimed task '/etc/pki/tals/afrinic.tal'
Nov 21 17:41:24 DBG: Thread Validation.3: Claimed task '/etc/pki/tals/lacnic.tal'
Nov 21 17:41:32 DBG: Thread Validation.4: Task '/etc/pki/tals/afrinic.tal' ended
Nov 21 17:41:32 DBG: Thread Validation.4: Waiting for work...
Nov 21 17:41:52 DBG: Thread Validation.3: Task '/etc/pki/tals/lacnic.tal' ended
Nov 21 17:41:52 DBG: Thread Validation.3: Waiting for work...
Nov 21 17:42:24 ERR: https://ca.rg.net/rrdp/notify.xml: Error requesting URL: The requested URL returned error: 404. (HTTP code: 404)
Stack trace:
  fort(print_stack_trace+0x1f) [0x417e5f]
  fort(__pr_op_err+0x84) [0x418424]
  fort() [0x423537]
  fort() [0x42387d]
  fort(http_download_file+0x35) [0x423a65]
  fort(rrdp_parse_notification+0x13a) [0x42c15a]
  fort() [0x42a565]
  fort() [0x4252e3]
  fort(certificate_traverse+0x8ae) [0x426eae]
  fort() [0x428ca3]
  fort() [0x4296c9]
  fort() [0x437307]
  /lib64/libc.so.6(+0x8db17) [0x7f16341b7b17]
  /lib64/libc.so.6(+0x1126c0) [0x7f163423c6c0]
(End of stack trace)
$ fort --version
fort 1.5.3
$  lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description:    Fedora release 35 (Thirty Five)
Release:    35
Codename:   ThirtyFive
$ sha256sum `which fort`
7f9919544b4643572815913b01ba8cee0f04a8ef1115aa3dd7df2d13ce217c6b  /usr/bin/fort
/tmp $ tar -Jcf 20211121-fort.tar.xz fort
...a 1 minute wait...

Local repository is in https://drive.google.com/file/d/13SVrMK4xDENcS6kqBI3wNtUo0yOD29z0/view?usp=sharing

ydahhrk commented 2 years ago

One thing I failed to mention in the release notes is that, since 1.5.3, Fort is now more trigger-happy with the stack traces. I thought this might help since bug reports have been on the rise. But it's too trigger happy, evidently.

You can tell simply by running the binary without arguments:

$ fort
Nov 22 09:34:16 ERR: The TAL(s) location (--tal) is mandatory.
Stack trace:
  src/fort(print_stack_trace+0x3b) [0x55e3959afccb]
  src/fort(__pr_op_err+0x9f) [0x55e3959b040f]
  src/fort(handle_flags_config+0x7a2) [0x55e3959ae102]
  src/fort(main+0x7c) [0x55e3959ab4fc]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb3e0eee0b3]
  src/fort(_start+0x2e) [0x55e3959ab64e]
(End of stack trace)
Nov 22 09:34:16 ERR: Try 'src/fort --usage' or 'src/fort --help' for more information.
Stack trace:
  src/fort(print_stack_trace+0x3b) [0x55e3959afccb]
  src/fort(__pr_op_err+0x9f) [0x55e3959b040f]
  src/fort(handle_flags_config+0x396) [0x55e3959adcf6]
  src/fort(main+0x7c) [0x55e3959ab4fc]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb3e0eee0b3]
  src/fort(_start+0x2e) [0x55e3959ab64e]
(End of stack trace)

I'm going to have this toned down by the next release.

Just to be clear: Bug #65 refers to this critical:

Nov 10 00:03:12 CRT: /etc/pki/tals/ripe.tal: Attempted to pop empty X509 stack
Stack trace:
  fort(print_stack_trace+0x1a) [0x41894a]
  fort(pr_crit+0x89) [0x41b829]
  fort() [0x4152e1]
  fort(deferstack_pop+0x2f) [0x4154ef]
  fort() [0x42a004]
  fort() [0x42a9b1]
  fort() [0x439131]
  /lib64/libc.so.6(+0x8db17) [0x7f226b144b17]
  /lib64/libc.so.6(+0x1126c0) [0x7f226b1c96c0]
(Stack size was 9.)
ydahhrk commented 10 months ago

As stated in the 1.6.0 release notes, I have found and patched several instances of undefined behavior during the reviews. There is no way to prove that these caused this particular crash (particularly considering that I never managed to reproduce it, and the OP already probably left), but the code has changed so much, at this point I expect the bug to manifest in a completely different way, if at all.

Please upgrade to the latest version. If it crashes again, please open a new bug.