AntelopeIO / spring

C++ implementation of the Antelope protocol with Savanna consensus
Other
9 stars 5 forks source link

When starting firehose from a snapshot on kylin, nodeos terminates with `DMLOG FPRINTF_FAILURE_TERMINATED` #724

Closed matthewdarwin closed 4 days ago

matthewdarwin commented 2 months ago

When starting firehose from a snapshot on kylin, nodeos terminates with

Sep  9 14:30:37 kylin-sfdm42 nodeos[2291398]: DMLOG FPRINTF_FAILED failed written=0 remaining=28 1 Bad file descriptor
Sep  9 14:30:37 kylin-sfdm42 nodeos[2291398]: DMLOG FPRINTF_FAILURE_TERMINATED

Nodeos will exit and if restarted will continue where it left off, so eventually it stops exiting and runs live against latest blocks. (note there is a bug currently in firehose-core where firehose doesn't exit on EOF, this is already fixed and will be merged into future version of fireantelope... ref https://github.com/streamingfast/firehose-core/pull/66)

This nodeos stopping seems reproducible on spring 1.0-final and 1.0-rc3, but not on 1.0-rc2.

My crude understanding of how this is meant to work is that if the nodeos STDOUT buffer gets full it it is supposed to apply backpressure to nodeos to stop syncing blocks and wait until STDOUT buffer empties enough to continue syncing. Maybe some recent refactoring with forked blocks handling changed behaviour?

nodeos config.ini:

... (usual stuff)
deep-mind = true
contracts-console = true
api-accept-transactions = false

fireantelope config.yml:

start:
  args:
    - reader-node-stdin
  flags:
    log-verbosity: 0
    log-to-file: false
matthewdarwin commented 2 months ago

Working logging.json for firehose

logging.json

heifner commented 2 months ago

Running with: read-mode = head

spoonincode commented 1 month ago

Can you share whatever script or command you're using for launching nodeos + firehose together? My assumption is something as simple as nodeos | fireantelope but I want to make sure nothing else more complex in play.

matthewdarwin commented 1 month ago

nodeos | fireantelope it is (with some command line arguments to point at the correct config file)

fschoell commented 1 month ago

@heifner this still seems to be an issue on v1.0.1. And it also happened on an EOS node recently that was running in live mode at 2 blocks/s

bhazzard commented 1 month ago

@matthewdarwin and @fschoell are you able to reproduce this using the same version of firehose with Leap v5.0.0? I'd like to rule out that this isn't caused by a change in Firehose, rather than in Spring.

spoonincode commented 1 week ago

Please try 1.0.3, there is a fix in it that could have resolved this.

matthewdarwin commented 1 week ago

We upgraded to 1.0.3 yesterday

spoonincode commented 4 days ago

really quite confident this has been resolved in 1.0.3, feel free to comment or reopen if discovering otherwise