Constellation-Labs / snapshot-streaming

Apache License 2.0
2 stars 1 forks source link

Error "Neither last processed snapshot nor initial snapshot were found on disk" #53

Open flannoo opened 1 year ago

flannoo commented 1 year ago

Hi,

I've tried to setup the snapshot streaming on the latest supported tessellation version (v2.0.0-alpha.12), but I keep getting the error "Neither last processed snapshot nor initial snapshot were found on disk!".

Probably I'm missing something in my configuration, but I can't seem to figure it out.

In the "application.conf" file located in "kubernetes/snapshot-streaming" folder, I needed to add the following missing fields:

lastIncrementalSnapshotPath = "lastIncrementalSnapshot.json"
collateral = 0

And I also needed to add some currency indexes in the opensearch properties in that application.conf file, I used these values (copied values from "src/main/resources/application.conf")

currency {
        snapshots = "currency-snapshots"
        blocks = "currency-blocks"
        transactions = "currency-transactions"
        balances = "currency-balances"
      }

This is the complete application.config that I'm using:

snapshotStreaming {
  lastSnapshotPath = "lastSnapshot.json"
  lastIncrementalSnapshotPath = "lastIncrementalSnapshot.json"
  collateral = 0
  httpClient {
    timeout = "120s"
    idleTimeInPool = "30s"
  }
  node {
    l0Peers = [
    """{"id": "00b8a56a20fc2e2a0196b8b8f4593ea4f736555506950103eb6fbbe435c0eeb71b32abfe21ae63bb3de8b9afdfa604bfd5837ef61e261611b8a0e5efd92ef1ea", "ip": "l0-initial-validator", "port": "9000"}"""
    ]
    pullInterval = "5s"
    pullLimit = 9
    initialSnapshot = """{"hash": "24864f0fdf531dd9e86cd303e39decab1426ece898a45afeed4bc8f8b1ee9998", "ordinal": 0}"""
  }
  opensearch {
    host = "http://localstack"
    port = "4510"
    bulkSize = "10000"
    indexes {
      snapshots = "snapshots"
      blocks = "blocks"
      transactions = "transactions"
      balances = "balances"
      currency {
        snapshots = "currency-snapshots"
        blocks = "currency-blocks"
        transactions = "currency-transactions"
        balances = "currency-balances"
      }
    }
  }
  s3 {
    bucketRegion = "us-east-1"
    bucketName = "snapshots"
    bucketDir = "snapshot-streaming"
    api {
      endpoint = "http://localstack:4566"
      region = "us-east-1"
      pathStyleEnabled = true
    }
  }
}

After adding that and running the skaffold dev --trigger=manual, the application starts but then yields the error that no snapshots were found on disk.

This is what I see in the logs:

[snapshot-streaming-6d895d6bd9-6mkh7 snapshot-streaming] {"cluster_name":"opensearch","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"discovered_master":true,"discovered_cluster_manager":true,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}08:04:42.218 [io-compute-blocker-1] INFO  o.c.s.o.OpensearchDAO - Initiating es client.
[snapshot-streaming-6d895d6bd9-6mkh7 snapshot-streaming] 08:04:42.239 [io-compute-blocker-1] INFO  c.s.e.h.JavaClient$ - Creating HTTP client on http://localstack:4510
[snapshot-streaming-6d895d6bd9-6mkh7 snapshot-streaming] 08:04:47.915 [io-compute-0] ERROR o.c.s.App - Neither last processed snapshot nor initial snapshot were found on disk!
[snapshot-streaming-6d895d6bd9-6mkh7 snapshot-streaming] java.lang.Throwable: Neither last processed snapshot nor initial snapshot were found on disk!
[snapshot-streaming-6d895d6bd9-6mkh7 snapshot-streaming]        at org.constellation.snapshotstreaming.SnapshotProcessor$$anon$1.$anonfun$runtime$10(SnapshotProcessor.scala:208)

Any ideas on what I'm missing or doing wrong? It seems it's not connecting to my running tessellation cluster v2.0.0-alpha.12 (or it goes in error before trying to connect to it). I also tried updating the l0Peers ip property to the local IP address instead of the name "l0-initial-validator" but that doesn't seem to change anything

Any help is greatly appreciated :-)

Thanks!

flannoo commented 1 year ago

Digged a bit deeper into this and it seems like the functionality for loading the initial snapshot changed in the commit 572cd0b3f36937c3f834ce9f1c836a39afdebbdb.

Do I need to include a json file in my container now that contains the initial snapshot? Or how does loading that initial snapshot work when running the snapshot-streaming the first time? I tried including that json using the old format:

{"hash": "24864f0fdf531dd9e86cd303e39decab1426ece898a45afeed4bc8f8b1ee9998", "ordinal": 0 }

But now I get this error: io.circe.DecodingFailure$DecodingFailureImpl: DecodingFailure at .signed: Missing required field

I tried several variations of the file but can't seem to make it work.

Would it be possible to provide the correct json for that initial snapshot?

marcinwadon commented 1 year ago

hey @flannoo. please try fetching the snapshot from the network and set it as lastSnapshot.json (for full snapshot) or lastIncrementalSnapshot.json (for incremental snapshot).

curl -X GET http://<nodeip>:<nodeport>/global-snapshots/latest -H "Accept: application/json"

The snapshot must be a json-serialized signed snapshot (either full snapshot for lastSnapshot.json or incremental one for lastIncrementalSnapshot.json). Later on, snapshot streaming will overwrite this file accordingly while processing next snapshots.

flannoo commented 1 year ago

Hi @marcinwadon , thank you for looking into this. I was able to make it work by adding the following line to the docker file before starting the app, which downloads the genesis full snapshot and saves it as "lastSnapshot.json" (I needed to include an empty "hash" and "proofsHash" field to that and wrap it in "signed" object):

curl -X GET http://l0-initial-validator:9000/global-snapshots/0?full=true -H "Accept: application/json" | jq '{signed: ., hash: "", proofsHash: ""}' > lastSnapshot.json

The only issue I have now, is that the genesis snapshot (with ordinal 0) is not imported in the opensearch cluster, as the app thinks it's already uploaded (since we define that as the lastSnapshot that was imported).

Is there any way I can also upload the ordinal 0 snapshot? Or will I need to do this in a manual way somehow (asthe snapshot-streaming app isn't foreseen to handle this scenario perhaps)? Ordinals can't be negative numbers, so I can't change the ordinal of the "lastSnapshot.json" to -1.

I want to run this locally as I want to test a change I did in the blockexplorer to add an API that retrieves all the wallet balances in a certain snapshot ordinal. Right now, it's not returning the correct balances as the genesis snapshot is not stored in opensearch (so I don't know if it's a bug in my code or if it's due to missing data).

Thanks!

TheMMaciek commented 1 year ago

Hi @flannoo. You can't force the current version of snapshot-streaming to process the genesis because the current version works only with the incremental snapshots and genesis is a full snapshot. The way it came to this is that previously the network was operating on global snapshots only, then we had a switch to using incremental snapshots which are lighter. The snapshot-streaming got adjusted to work on incrementals with and initial start from a last full snapshot and because all our envs already had full snapshots processed we didn't have an issue with no full snapshot persisted on disk at the time. Long story short we will have to add the processing of genesis (full snapshot) to the snapshot-streaming so that any new network can successfully use it. We are aware about the issue and we have a ticket for adding that functionality. Will let you know on the status of this ticket.

BTW are you saying that after mocking the hash and proofsHash in the lastSnapshot.json all the following incremental snapshots (1, 2, and so on) are getting processed?

flannoo commented 1 year ago

Thank you @TheMMaciek

BTW are you saying that after mocking the hash and proofsHash in the lastSnapshot.json all the following incremental snapshots (1, 2, and so on) are getting processed?

Yes, that is correct. I fetch the latest FULL snapshot using the curl statement below and wrap the result in a "signed" json object together with empty hash & proofsHash fields and then the application is able to process the next (incremental) snapshots and upload them to opensearch.

So I added the below line to the Dockerfile in the "start.sh" file, before it starts the java application (so the "lastSnapshot.json" will be present on the filesystem when the app starts)

curl -X GET http://l0-initial-validator:9000/global-snapshots/0?full=true -H "Accept: application/json" | jq '{signed: ., hash: "", proofsHash: ""}' > lastSnapshot.json

This results in this kind of file (removed some content to keep it short)

{
  "signed": {
    "value": {
      "ordinal": 0,
      "height": 0,
      "subHeight": 0,
      "lastSnapshotHash": "9f3ed34c012794ef8dbc5ee6efa82228424259146c0b389f55982fc21197b421",
      // etc.. (removed for abbrevity)
    },
    "proofs": [
      {
        "id": "00b8a56a20fc2e2a0196b8b8f4593ea4f736555506950103eb6fbbe435c0eeb71b32abfe21ae63bb3de8b9afdfa604bfd5837ef61e261611b8a0e5efd92ef1ea",
        "signature": "30440220052f919c75d21ca4e2e7a19ffaaab160c56b91fc266a2a461e94754726a1e76b022055d51344856f6544ef540bdd7fd5cb3b9f3664e7c1b3875aee6e7a9a3c39c2e1"
      }
    ]
  },
  "hash": "",
  "proofsHash": ""
}

The application then successfully uploads the subsequent (incremental) snapshots:

INFO  o.c.s.S.$anon - Pulled following global snapshot: SnapshotReference{height=0,subHeight=1,ordinal=SnapshotOrdinal{value=1},lastSnapshotHash=24864f0fdf531dd9e86cd303e39decab1426ece898a45afeed4bc8f8b1ee9998,hash=463ff2768f4750b9f33962939247d606f9ff77248fc5312077619a3b96e50353,proofsHash=7d8c32b8c258d9778d635f3f44aeac2f628180e7cc129444b2d9ecfec56a1827}

INFO  o.c.s.s.S.$anon - Snapshot 1 (hash: 463ff276) uploaded to s3.

INFO  o.c.s.S.$anon - Snapshot 1 (hash: 463ff276) sent to opensearch.

INFO  o.c.s.S.$anon - Pulled following global snapshot: SnapshotReference{height=0,subHeight=2,ordinal=SnapshotOrdinal{value=2},lastSnapshotHash=463ff2768f4750b9f33962939247d606f9ff77248fc5312077619a3b96e50353,hash=a89e8b2b84e62d0cdc794c41e02e7e3d37c9459e90ca2ed808840e0107680d7f,proofsHash=b463fc5fc9bf39cd29530b511e7a7e83f58ee6e446e22f24000d69737640e6f9}

INFO  o.c.s.s.S.$anon - Snapshot 2 (hash: a89e8b2b) uploaded to s3.

INFO  o.c.s.S.$anon - Snapshot 2 (hash: a89e8b2b) sent to opensearch.

I'm using the tessellation repository and run the skaffold command to spin up the cluster on my local dev machine (so it spins up a new network every time I test, I don't have a permanent DEV cluster running)

flannoo commented 1 year ago

So perhaps a good way to solve this could be instead of throwing the error that no snapshots were found, that the application tries to retrieve the genesis full snapshot (0) using this URL:

http:///global-snapshots/0?full=true

TheMMaciek commented 1 year ago

Yeah using this endpoint to pull the full snapshots is of course part of the solution. But also the whole mapping of full snapshot to the schema expected by opensearch also needs to be implemented - FYI the incremental and full snapshot schemas are different so you can't just propose full where incremental is expected. Currently this mapping for the full snapshots is not in the codebase it needs to be added - good thing is it shouldn't be too hard.