Database smaller details

anxolin commented 3 years ago

While reviewing #313, since it was the first PR I review of the database structure, I notice a few things that I wanted to mention. But they were unrelated to the PR, so I breing my comment https://github.com/gnosis/gp-v2-services/pull/313#pullrequestreview-590742597 here.

there's a few indexes missing: like sell token
The FKs are missing, like from trade to its order
Ideally we would save already the denormalized short id
I would consider adding a FK in the trades to the settlement, I would make sure there is a tx hash (that would be important for us in the FE), and I would consider to move those "block_number". Maybe you want to even use this FK+log_index as the PK.

I wouldn't do this one, until we see we have performance issues, but, one thought I had is, If we expect issues with having many orders, you can set a flag for the orders that are final:

expired
canceled
fully executed For these order you know u'll never need them again (for the ones considered by the solver). If we have too many orders, an index in this field would basically allow us to quickly filter out most of the orders. As I said, this one, better to wait for the real need

anxolin commented 3 years ago

Bringing @fleupold comment https://github.com/gnosis/gp-v2-services/pull/313#issuecomment-779763693 too:

Regarding @anxolin feedback (not related to this PR so maybe better to have on an Issue?)

The FKs are missing, like from trade to its order

I believe the reason for no imposing a FK is that we could potentially see a trade for which we didn't record the order data (in case of data loss, as trades come from on-chain). Right now we still want to store those in the database.

Ideally we would save already the denormalized short id

Can you elaborate on that? Shouldn't the database be usually be normalized if possible?

and I would consider to move those "block_number".

I don't understand what you mean by this. Where do you want to move block_numbers to?

anxolin commented 3 years ago

The FKs are missing, like from trade to its order

I believe the reason for no imposing a FK is that we could potentially see a trade for which we didn't record the order data (in case of data loss, as trades come from on-chain). Right now we still want to store those in the database.

But ideally we would have all the trades in the database, right? So we can check the history. Or you mean we plan to do this is phases and for now we don't index the old events.

In that case, I wouldn't even add a trade without order. If that happens, I would log an error and that it. If in the future we have a service that populates this info, the order and it's trades will be added to the db. But I would be ok, to have only trades for which we have the order.

Ideally we would save already the denormalized short id

Can you elaborate on that? Shouldn't the database be usually be normalized if possible?

Yes, I'm just not sure how you plan to implement the "shortId" support. The explorer will use short ids for it's URLs. So we would need to find the order using this 8 chars.

Although, I thinking we are probably fine without denormalizing, depending on the index you use for ORDER_UID, many database can use the index for LIKE patterns. if they look like "my-short-id%", if it looks like "%my-short-id%" it won't work I believe. You can test it anyways. Just flagging we would like to query them.

and I would consider to move those "block_number".

I don't understand what you mean by this. Where do you want to move block_numbers to?

I'm refering to create a new table called batch or settlement. And add a FK from the trade (batch_id --> batch). Then in the batch table we can have, among others:

batch_id
tx_hash
block_number
settlement_time
solver_address

fleupold commented 3 years ago

Regarding the shortId, I was under the impression the frontend would abbreviate it for display purposes but the ID would still be a long identifier. What's the purpose of having a short trade ID if e.g. the oder ID is still a long 56 byte identifier?

anxolin commented 3 years ago

Yes, the official ID would be 56 bytes, but the same as Github or Git allows you to access a commit using the short version (for UX), we would like to offer short URLs too. I think they are convenient

You can check what I mean in the 3rd slide here https://docs.google.com/presentation/d/10t4Hs3EAA3M99G_hPQxrtfZeWo6vJGsI/edit#slide=id.gbc9d9d1fd7_0_13

Also, in the spec we mention it in several places, one is in the API spec proposal: https://docs.google.com/document/d/179i8zuUrHpfvpU988HMqLB_isTJB383aOS9rGaJygh4/edit?ts=6022bf99#heading=h.c3em8as8bi57

nlordell commented 3 years ago

What's the purpose of having a short trade ID

I think the BE should allow searching by short ID, kind of how Git allows searching for refs with shortened commit hashes. I also think, that like Git, it should allow dynamically sized short IDs. As @fedgiac mentioned in Slack using something like only 8 characters is not collision resistant enough (Birthday paradox). For example, the git repo for Linux needs ~11~ 13 (if I'm not mistaken) characters to uniquely identify a commit hash, while smaller repos typically need much less (I think by default Git uses 7 characters in these cases).

In that sense, I don't think normalizing short IDs is entirely possible since what may be a valid short ID when it was queried, may no longer be valid 10 minutes later after new orders were added.

anxolin commented 3 years ago

, I don't think normalizing short IDs is entirely possible since what may be a valid short ID when it was queried, may no longer be valid 10 minutes later after new orders were added.

Good point! You are right. Anyways, I think postgresql is powerful to let us do this fairly easy. Think left-anchor LIKE patterns would be able to be used with INDEXes, and for more advance things there's sth like pgtrgm which probably is not needed.

gnosis / gp-v2-services

Database smaller details #319