Suggestions for new indexer APIs

katat commented 4 years ago

Based on my understanding, currently the Lumos indexer mainly has two APIs for querying the live cells and all the transactions. As reported in the issue, when there is a huge volume of transactions under a lock script, it will take a very long time for Lumos to process and response. In that case, it took more than 2 hours to return all the transactions under a lock script on my machine.

Because of the needs of flexibly querying the transactions and cells, either live or dead, applications, such as Neuron, would still need to cache the data in a certain way while utilizing Lumos as the basic indexing engine, which helps maintain the full index with small disk space footprint and stream the data based on lock script to the client-side.

Therefore, to fulfill the requirements for certain applications, such as Neuron, some new APIs will be needed to work around the performance issues.

Below are some suggestions on the new APIs on Lumos that I think would be helpful in providing more flexibility for the applications to efficiently process necessary transactions and be able to customize the queries that are easier to divide and conquer. Please feel free to comment or suggest other solutions.

#listenByLockScript provides an event sourcing API for the applications to listen to new transactions under a lock script via WebSocket.

This would allow the applications to process the new transactions as soon as possible without the need to poll and check if there are new transactions under the monitored lock scripts recorded on-chain, simplifying the codebase in the application layer while avoiding unnecessary performance overheads introduced by the potential polls, which would need to check block numbers / compare the transaction counts between indexer and the ones in local cache.

In addition to new transactions event, it would be also helpful to emit the forked event when there is a fork occurred, so as to notify the application layer in order for them to do necessary rollbacks in their local caches.

#getTotalTransactionsCountByLockScript allows to fetch the total number of transactions under a lock script.

This will be useful when the applications want to compare with the local cache, and check if it is necessary to sync the new transactions in their cache after disconnected the network for a while.

#getTransactionsByLockScript(offset, fetchSize) basically supports the pagination when fetching transactions under a lock script. With the #getTotalTransactionsCountByLockScript suggested above, the applications can customize fetching patterns either in parallel fashion in a batch or just a page of a certain size of transactions.
#getTransactionsByLockScript(fromBlock, toBlock) allows querying the transactions under a lock script between a block number range.

xxuejie commented 4 years ago

I get that a new cell listener can be quite useful, but there are more concerns to this:

The implementation is actually more tricker to get right. Any subscription mechanism will have a problem: should it be reliable or not? If there is a downstream listening at the updates, but the downstream cuts off for a while, when it reconnects, should we send it all the missing updates it didn't receive, or should we continue here? This is actually a more complicated problem to get right if you think in the framework directly. So for now, I still recommend a polling solution.
Even if we have a subscription service, having WebSocket in a dapp framework doesn't really make sense. What we have is a single entity here, if the dapp wants to relay updates to client side, it can then decide to use WebSocket, the lumos framework, won't need this level of details.

Yes this is an API we can fetch
I think there are 2 problems:

The slowdown is an implementation quirk within lumos-indexer, we can fix this without affecting the current API.
With that considered, I'm not sure if we need a pagination solution. The question I would ask, is what could be achieved in a pagination solution, but couldn't achieved in an async interator? Again, we are talking about a framework that is directly called by the dapp, we don't have the usual gap of web, where you have to pass in parameters to work around pagination. In the lumos indexer case, I do believe an iterator provides more potentials.

Yes, this is something we can add as query parameters.

katat commented 4 years ago

Thanks @xxuejie for the quick response. Glad to know 2 and 4 will be supported.

Any subscription mechanism will have a problem: should it be reliable or not? If there is a downstream listening at the updates, but the downstream cuts off for a while, when it reconnects, should we send it all the missing updates it didn't receive, or should we continue here?

I think this is a good point. In terms of streaming "cut-off", I think the Lumos Indexer just needs to push whatever they newly received from the CKB-indexer to the subscribers. The downstream clients can use #getTotalTransactionsCountByLockScript to check if there is a gap to fill in the missing transactions. If there is mismatch between the count from Lumos Indexer and the one from local cache, then it can call #getTransactionsByLockScript(fromBlock, toBlock) to re-sync the transactions between a block range.

What we have is a single entity here, if the dapp wants to relay updates to client side, it can then decide to use WebSocket, the lumos framework, won't need this level of details.

But in the case, Neuron is the client of Lumos, isn't it? :)

The slowdown is an implementation quirk within lumos-indexer, we can fix this without affecting the current API.

Sounds good. Just FYI, on my machine, it took more than 2 hours to fetch all transactions for a certain lock script. Hopefully some optimizations can be done on Lumos side to dramatically improve the fetching efficiency so that the process can be done within an acceptable timeframe that is user friendly enough.

The question I would ask, is what could be achieved in a pagination solution, but couldn't achieved in an async interator?

I think I kind of getting what you mean. To do the pagination on the generator, it could call the #next without the await before calling the next #next, until it has done a certain number of #next, which is to simulate how the traditional pagination works. But there are two critical problems, if I understand correctly:

When calling #next, it is not just iterating into next generator, but also calls the actual fetch function. If the desired section of transactions is very deep in the pagination, then all the precedent transactions will need to be unnecessary loaded onto memory. Also, these precedent transaction fetch calls will be run in parallel against the CKB-indexer (or node actually), which is likely to get bombarded with the RPC calls(assume the transaction number is 100k+) and slow to respond.
Let's assume the first one is non-issue, based on my test with the same lock script mentioned in the other thread, getting all the transaction keys took at least 5 seconds, which I think is too long for the clients to wait for.

Maybe I missed something here, so it will be very helpful if you can provide an example of doing the pagination using TransactionsCollection#collect for a non-zero page without introducing the unnecessary RPC calls.

xxuejie commented 4 years ago

I think the Lumos Indexer just needs to push whatever they newly received from the CKB-indexer to the subscribers.

If neuron is the sole client of lumos, I would agree with you. But lumos is designed for many different scenarios, people might rely on it in unexpected ways, I'm not sure if it is a good idea unless we have a good enough solution for the problem.

But in the case, Neuron is the client of Lumos, isn't it? :)

We need to carefully define the word "client" here. Neuron will be expected to use lumos-indexer as a library, between 2 libraries in a single binary, I doubt if WebSocket is needed here. Even between the client side code and the underlying backend code of neuron the electron app, I doubt a WebSocket connection makes sense. It just complicates the logic for no obvious gains.

But there are two critical problems, if I understand correctly: (...)

I think we've talked about this and I mentioned this several times in offline chat. Please don't make decisions based on the current implementation of lumos. The only thing that matters, and what we should discuss, is if the API designed makes sense, not that current implementation of the API has some quirks that might slow things down. Both issues you are mentioned here, IMO, could perfectly be solved without changing any of the async iterator API.

xxuejie commented 4 years ago

Also one more word regarding pagination: I think pagination is merely an abstract term when you have 2 separate entities. Such as between the frontend running in the user's browser, and the app running on remote server; or between a backend server, and the SQL database. It's the fact that you have 2 separate entities that don't share anything, that you need an abstract term like pagination. But what we are talking about here, is lumos the framework, and the upper level dapp code, they coexist in the same process where they can share data with each other. In this sense, I do feel an async iterator, or in its core, a closure, is more than enough to achieve the use case with better performance potential, and less resource usage.

katat commented 4 years ago

Ok, cool.

I think for now 2 and 4 are useful enough for Neuron to have some basic processes to digest the data from the Indexer, assuming they won't inherit the performance issue from the current implementation.

If possible, an estimation on when these two API would be available is highly appreciated.

Thanks.

xxuejie commented 4 years ago

We can aim at the end of next week to provide 2 and 4 here.

ckb-js / lumos

Suggestions for new indexer APIs #4