Better handle increased scope of the order book api

vkgnosis commented 3 years ago

I am a little worried about the scope of the backend order book api. Imo there are are three broad categories of apis:

Api needed for the system to run. This is roughly what we have implemented right now. It allows users to submit orders, delete orders, get the minimum fee and solvers to get all solvable orders. It operates on low level data that matches how the smart contract works.
Api that makes the order book easier to use for people who call it directly (including the frontend). Things like getting all of the orders of a specific user including unsolvable orders (for example because filled), getting denormalized data like a more abstract order state, getting executed trades for an order.
Api that implements functionality specific only to our frontend. Things like top traded tokens or other statistics that we want to show in our frontend but that we don't expect other people to use directly. The border to 2. is fuzzy.

The reason I am making this distinction is because I feel that right now we are mixing all three categories together and before we implement further apis we should consider if we want to separate them in some way. The drawbacks I see:

Individual routes become more awkward as they have to handle more use cases and creates dependencies between unrelated things. For example get_orders from the perspective of the solvers does not need a user filter and OrderMetaData does not need to contain a executedSellAmountBeforeFees field. But of those are useful for the frontend.
Development is harder. When we change a route we have to consider how this affects all categories instead of just one.
The deployment is less resilient. If we have a bug or a panic or in for example complicated web socket code for the frontend then that shouldn't bring down the category 1 api.

What I would like to do is at least separate category 1 from the rest on a route and code level. We could have a paths like (ignoring bad naming) /api/minimal/v1, /api/user-friendly/v1 and implement them in different rust modules. This gets rid of the api level entanglement. It will make it easier to move to separate crates / binaries in the future which has more organizational overhead so we might not want to that right now. I hope that from the perspective of the frontend this would increase our development velocity and potentially allow frontend team members to implement some apis directly in rust themself without probably waiting (probably frustratingly long) for us to discuss and implement the suggestions.

vkgnosis commented 3 years ago

Looking for feedback on this @gnosis/gp-services .

bh2smith commented 3 years ago

Very well said. I can easily get behind this as I have experienced DB concurrency issues in the past for similar reasons. Of course the separate crates is also a very good idea.

fleupold commented 3 years ago

Thanks for raising this topic. I agree with the concerns and that long term we want to split the "core" functionality (what we need to get highly available and distributed consensus on in order to impose fairness) from the more convenient functions (a bit like ETH RPC interface vs more convenient The Graph or Dune APIs). In the short run, those will practically still run in the same context (just hosted by us on the same nodes), but it would be nice to factor the code with future separation in mind (e.g. separate crates, maybe even separate processes already).

One concern about separating the logic is that it will require some duplication (e.g. we will have largely overlapping Order models for the type 1 and type 2 API) and that it might not yet be clear which endpoints belong into what category (e.g. is a price estimation endpoint part of the critical API). It will also be an overhead for developers trying to integrate with the project to flip flop between different APIs (maybe we should keep a single top level documentation for now). However this is probably a relatively small price worth paying compared to having to untangle everything later (if we get to that point).

Would also be nice to get @gnosis/gp-integrations input.

I can easily get behind this as I have experience DB concurrency issues in the past for similar reasons.

@bh2smith Can you explain a bit more what you mean by this?

alfetopito commented 3 years ago

I like the idea and I believe is sort of the direction @anxolin is guiding the API suggestions on the Explorer spec docs, having some endpoints return minimal only data with optional fields. This issue makes the separation of concerns more clear.

I also like the point where frontend can progress in parallel with its requirements without impacting mission critical endpoints. And also having more liberty to implement ourselves and not disrupt backend schedule/resources.

bh2smith commented 3 years ago

@bh2smith Can you explain a bit more what you mean by this?

I was thinking mostly about the fact that we could run into issues with having several different types groups making insertions to the DB at the same instant. Say, for example, there are multiple front ends for order placement and people are placing orders at the same time. As I recall, postgres can fail to include certain entries when multiple services are inserting to the same table. Of course, most of the groups in this discussion are just read from the DB, but I mention this because of the separation of "routes" brought up here.

bh2smith commented 3 years ago

Given that we are already on this topic and I was thinking to start working on get_trades, I wanted to discuss my planned approach based on the current architecture.

First note that trade data is stored minimally in our DB (i.e. without order information) so that returning appropriate information will require a join on the orderbook via trade.orderUid = order.Uid to actually return relevant information to the explorer.

As for the current architecture, my plan for this implementation is to take the boiler plate code used in get_orders which essentially consists of Order struct, OrderFilter, Query struct, along with request and response methods. Using this boiler plate, I would introduce a new file get_trades that implements these things, but for trades.

Does this sound about right (@fleupold, @e00E) or do we expect that this issue will impact the way these are implemented?

fleupold commented 3 years ago

As I recall, postgres can fail to include certain entries when multiple services are inserting to the same table.

Do you have a reference to this? Maybe I'm misunderstanding your concern but as far as I know postgres allows concurrent writes (transactions just might have to wait for another tx to be comitted before it can execute).

As for the get_trades route (let's discuss in a separate issue), I believe this is partially why @e00E raised this point. It should be in a separate module (maybe API can have core and extended submodules).

To me an open question is still where does the separation end. Do we only put the routes into separate modules? Do we also put the logic code (currently e.g. orders.rs, events.rs, in the future trades.rs) into different modules? Should the extended API even operate on the raw database connection or should it ask the core API for the raw information and aggregate internally (this would be closes to how RPC wrapping APIs work today)?

vkgnosis commented 3 years ago

One concern about separating the logic is that it will require some duplication (e.g. we will have largely overlapping Order models for the type 1 and type 2 API)

This is true but also not a big deal imo. In the future the models might diverge more.

and that it might not yet be clear which endpoints belong into what category (e.g. is a price estimation endpoint part of the critical API).

The way I was thinking about it is that price estimation if not category 1 but yes it is not always clear.

It will also be an overhead for developers trying to integrate with the project to flip flop between different APIs (maybe we should keep a single top level documentation for now).

It's fine to have the single top level docs for now. Also the routes might naturally get a category 2 equivalent like how I see having a cat 1 get_solvable_orders and a cat 2 more extensive get_orders.

To me an open question is still where does the separation end.

This is an open question to me too. That's why I said I want to at least separate it in on the api level so it's easier to do further steps later. I feel that is a practical compromise and we can untangle more on a code level once the apis have settled down more.

Should the extended API even operate on the raw database connection

Having the extended api operate on the db directly feels more practical. In some way wrapping is cleaner but then the cat 1 apis get more complex to facilitate that like with get_orders.

anxolin commented 3 years ago

@vkgnosis good call raising this concern, I think is a very important topic and in general I agree with your concern and your assessment of separating critical services from other ones.

There was a lot of points and information, I'll give it a try to answer to some of them :)

To your 3rd point:

Api that implements functionality specific only to our frontend. Things like top traded tokens or other statistics that we want to show in our frontend but that we don't expect other people to use directly. The border to 2. is fuzzy.

I agree with your general message, although not agree that "top traded tokens" is only for our front end. I tried to suggest the APIs not for the explorer only, but in general something that anyone would be expecting from a exchange API. A couple of examples:

A bot can read the 10 highest volumes in the last 24 and then look for opportunities
This top traded tokens can be used by coingecko and other websites that would give us visibility which is super important and it's a weak point we had in the past

Suggestion for drawing the line: I think one natural way to solve this is to do a use case diagram

We list all the use cases we expect for the future, we can look for inspiration in other APIs
i.e. "Get orderbook", "Get mid market price", "Get last trade price", "Get trades for a market/token/user/appId", "Get batches", "Get settlement details", "Subscribe to events on my order", etc..
We don't need to detail each use case, but it would allow us to zoom out a bit and think ahead of time
With the use cases, we can start to draw the lines and discuss which main responsibilities we see. Are we doing a functional division? are we doing a structural division? Are we doing a "criticallity" division?

Regarding @fleupold

(maybe we should keep a single top level documentation for now)

I agree 100%, I believe the one I was documenting should be the main public one

Additionally, If something as critical as posting an order is meant to be used by user, it should be part of this main top level one

Only if we have an endpoint that is too specific or narrow I would move it out of this top level API. I would consider this, an internal API we don't advertise (it can have it's swagger and URL too, but not just in the main public API)

Regarding duplication of code, and how to separate things, I believe there's more than one alternative:

Independent APIs using shared libraries
It can be the same codebase, with some parametrization that would start only a group of endpoints --> this could be easier for starting, and could evolve into separate projects naturally when they get big and clearly differentiated. This approach allows for example to launch 10 instances of the critical service and have a balancer in front, and only 3 instances of a non critical one.

Regarding @bh2smith

As I recall, postgres can fail to include certain entries when multiple services are inserting to the same table.

I would be surprised, DB are designed for working with concurrency. Make sure you take the isolation level and propagation of transactions into account. Not sure what u mean with fail, but there's strategies to allow this "failures" on concurrency, but it's by design and is a programming choice (they are optimistic approaches used for performance). Happy to learn more about this.

Last thoughts:

I think our internal structure shouldn't exposed to our "public consumers"
Tools like https://aws.amazon.com/es/cloudfront/, nginx, HAproxy, let us hide these implementation details and provide a good experience to our happy customers

fleupold commented 3 years ago

For the sake of being explicit with regards to the separation, @anxolin do you agree that the "use cases" you described e.g. Get mid market price can be considered as part of the "non critical service"? The critical service would only be:

get all currently settlable orders
get min_fee
post new order

anxolin commented 3 years ago

I believe the most critical ones are the ones related to trading, so:

get min_fee
post new order

Not sure what is get all currently settlable orders

But I think that independently on what is critical, I think that both Get mid market price type of endpoint, and post new order should be under the same publicly exposed and coherent API

In my view, what is critical, u might want to just launch in independent processes, with redundancy. How you organize the code to do that, I would say is an implementation detail. And my point is, those details shouldn't alter how the API is consumed, so same endpoint, same swagger docs imo

bh2smith commented 3 years ago

To clarify my previous comment on concurrency issues:

After speaking with a former colleague (@cix-code) who is very familiar with large postgres databases, it seems that, in very unlikely scenarios for us, we could experience a table being locked for either of UPDATE or READ when performing mass insertion (such as uploading CSV files) or with select statements without an ID in the where clause.

When making mass insertions or updates, this can lock select statements.
```
Example:
process1: INSERT INTO table1 ...
process2: SELECT * FROM table1 WHERE name = 'John Doe'
process2 will wait until the insert finishes.
```
preferred is: you know that ID of John Doe is 13, so you run SELECT * FROM table1 WHERE id=13. Since insert doesn't affect this record the two queries can run in parallel

In any case, it doesn't seem like our use cases fit this profile, expect possibly in the case that some feature requests the ability to place multiple orders simultaneously. I suppose we should be careful with SELECT statements that do not have a where clause (for example querying all trades).

anxolin commented 3 years ago

I suppose we should be careful with SELECT statements that do not have a where clause (for example querying all trades).

Ben, I also have been also dealing with decent size DB and big concurrency, let me know if you want to share an entity-relationship-diagram or UML with me about the overall DB, I can gladly give you some input.

Regarding, the performance issues you mention, I think we shouldn't get too ahead of ourselves in this sense, we'll see the bottlenecks when they show up. Also I say this because your comment is not strictly true. That sequence doesn't need to necessarily block us. There's ways to prevent these blockings to happen, and each problem can have potential different solution, so it depends on the use case. This is why I wouldn't worry about these things too much for now, and just keep an eye in our Prometheus metrics to detect these bottlenecks when/if they show up.

My suggestions for having a nice DB:

Do a diagram of how it would look in the future, to plan it a bit better
Monitor in prometheus queries, endpoint calls, or both

gnosis / gp-v2-services

Better handle increased scope of the order book api #297