IntersectMBO / cardano-node

The core component that is used to participate in a Cardano decentralised blockchain.
https://cardano.org
Apache License 2.0
3.06k stars 721 forks source link

[FR] - Query utxo indexer service #4678

Open disassembler opened 1 year ago

disassembler commented 1 year ago

Internal/External Internal if an IOHK staff member.

Area Other Any other topic (Delegation, Ranking, ...).

Describe the feature you'd like Do not use ledger state for query utxo.

The utxo on-disk work will make querying utxo from ledger state really slow (it's already really slow and getting slower as utxo size increases). What we need is a separate service that talks to the node socket to keep track of all utxos (or maybe a subset of interesting ones that the user defines). What we want to do is provide a similar interface, e.g.

cardano-cli query utxo --testnet-magic 1 --address addr_test1vzpwq95z3xyum8vqndgdd9mdnmafh3djcxnc6jemlgdmswcve6tkw

But not use the ledger state queries to get this information. Essentially, this will need to something like foldBlocks or marconi does indexing state transitions on the chain from genesis and keeping it separate from the node itself.

Describe alternatives you've considered

From @dcoutts

@abailly-iohk yes, the only way to make it fast is to use an index. The original feature was a quick convenient hack that got totally out of hand. (My bad.) And yes, sqlite would be perfectly good as an implementation of such and index, and the right place to do that is in an indexing client, and not within the node process itself.

One of our general architectural principles for the node and its external clients is that we make all chain information available to clients, but we do not store information in the node (and provide client access) that the node does not itself need. The node is part of the trusted base of the system. It is complex and has strong security and performance requirements. It is the wrong place to build database features. External clients are the perfect place to build database features.

The right solution is a client that maintains an index and provides query access. There are then multiple choices: provide a minimal one that just replicates what the CLI provided before (via the node), or use some existing more general purpose indexer, or some combo to satisfy different use cases.

abailly-iohk commented 1 year ago

There already exists quite a few indexers out there. And I can recommend one from a former colleague of mine: https://github.com/CardanoSolutions/kupo What would probably be the best route moving forward would be to define a common interface for this kind of query that both clients (cardano-cli) and servers (kupo, marconi, oura, blockfrost...) would agree on implementing. Then the cardano-cli could be pointed at various services, possibly with fallbacks.

xoriole commented 1 year ago

cardano-ledger-index service for Cardano

cardano-ledger-index service provides helpful queries for ledger state. It maintains extra indexes for faster querying of ledger state. Built on Haskell. Uses SQLite for persistence.

Reference: Marconi https://github.com/input-output-hk/plutus-apps/tree/main/marconi

Support for queries:

Future queries support:

How is it different from dbsync

disassembler commented 1 year ago

@abailly-iohk is our head of architecture. He'll be providing some feedback on how he wants this to be architected.

Jimbo4350 commented 1 year ago

I'd like to petition to keep the UTxO query for the case of local testnets. We have tests in cardano-node that utilize the UTxO query and in this instance performance is not an issue since the UTxO is small. The UTxO query is also useful for quick and dirty debugging especially when integrating a new era. Not having to setup an additional indexer would save myself and QA additional work.

abailly-iohk commented 1 year ago

The goal is to keep the UTxO query in the cardano-cli in order to not disrupt (too much) downstream users, but to move the work needed to execute the query outside of the node/consensus. I think that for the simple (testnets) use case the role of an indexer could be fulfilled by some stateless query tool able to read directly the node's Chain DB, much like what the existing db-analyzer command-line tool does.

abailly-iohk commented 1 year ago

@disassembler We would need some numbers on the current vs. forecasted performance of this (and possibly other) kind of queries, and a better understanding of what's the target/requirement from users' perspective. I think @dnadales or @Jasagredo have already run or will run some benchmarks for the utxo-hd case.

abailly-iohk commented 1 year ago

Also, we really would want to have some some kind of impact study which implies talking to various node users to understand what would be an acceptable solution.

abailly-iohk commented 1 year ago

@disassembler Shouldn't we reclassify this issue as Spike or Experiment based on DQ's work?

dQuadrantDev commented 1 year ago

While being stuck on ouroboros-network, we have explored the alternative approach of working on the rpc command integration to cardano-cli.

A non-production version for the marconi-mamba with rpcs and update of cardano-cli seems doable within this week. By rpc we are refering to json rpc simlar to bitcoin and ethereum

Meanwhile, we can also explore the ouroboros-network for the proxying part where a meeting with ouroboros-network team would greatly help.

Marconi

mesudip commented 1 year ago

We we were looking at ways to intercept and interpret cli-to-node communication and return query result using different service.

But it seems that the setting up an intercepting service was harder than we thought. cardano-api didn't have enough interface for this purpose and we looked at the ouroboros-network repo.

Can somebody point out how to setup an node-to-client service with ouroboros-network library but different handlers for the protocol messages?

Also, all the query apis in cardano-api seems to close connection to the node-socket after the query, Is there a way to implement the query service with persisted connection?

dQuadrantDev commented 1 year ago

@disassembler The non-production demo is available here. https://github.com/dQuadrant/plutus-apps/blob/feature/rpc/Readme.md

mesudip commented 1 year ago

I've created a PR. https://github.com/input-output-hk/cardano-node/pull/4810

gitmachtl commented 1 year ago

Can we have an output, that replicates the

cardano-cli query utxo --testnet-magic 1 --address addr_test1vzpwq95z3xyum8vqndgdd9mdnmafh3djcxnc6jemlgdmswcve6tkw

one? With that i mean a plaintext output like the original one. Many tools on the cli rely on this kind of output, because of the still existing bug in jq to work with really large numbers. For that reason also f.e. koios sends the amounts as strings. The current output from cardano-cli query utxo - if you send it to an out-file - is json, but tools like jq (most common on the cli) is having a problem with those large numbers (lovelaces). So, many tools are just using the plaintext output that cardano-cli provides and read the values from there. It would be extremly awesome and helpful if there would be a "cli compatible" output mode to it can just replace the old command.

rdlrt commented 1 year ago

The addition from PR seems a bit less than ideal to my eyes - essentially proposal is using cardano-cli for curl substitution just for user facing convenience, calling out small deficiencies that I see :

mesudip commented 1 year ago

@gitmachtl @rdlrt Thank you for your feedback on the PR. I would like to clarify that this is not a final PR, It was a quick implementation based on what's already available to show what can be done. I can understand your concerns regarding compatibility, scalability and security.

Based on the above feedback I have listed following points to be considered for final implementation

Given the considerations outlined, do you believe that this is the right way to go forward with the proposed changes?

Regards, - mesudip

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.