dimitri-xyz / haskell-coinbase-pro

A Haskell client for the Coinbase Pro API
MIT License
11 stars 6 forks source link

Implement and test all channels of WebSocket API #10

Closed ericpashman closed 4 years ago

ericpashman commented 4 years ago

Sorry for the delay, but here's my implementation of all data channels provided by the WebSocket API, issue #9. I consider this mergeable as-is, but there are some outstanding issues:

  1. I haven't yet implemented any of the private fields on messages pertaining to your own orders that show up only when authenticated. These were also unsupported in the limited existing implementation of the "full" channel.
  2. I haven't implemented "activate" messages, which relate to triggered stop orders. The documentation on these looks odd and I'll need to play around with some stop orders on the sandbox server and see what happens. Meanwhile, this won't result in any runtime errors so long as you don't place stop orders.
  3. My Aeson instances and QuickCheck instances need some work. They're all correct, so far as my testing can tell, but the Aeson instances (including the ones that already existed) should be re-written where possible to use the more efficient toEncoding method, and my QuickCheck Arbitrary instances are pure boilerplate that probably should be replaced with generic instances using the functionality in the generic-random package or similar.

I didn't use generic-random mainly because I felt bad about already having introduced a couple of new dependencies in the test target; the new QuickCheck and text dependencies are not really new, because they were already there as sub-dependencies of other packages, but the quickcheck-instances and aeson-qq dependencies are new. The latter one can go away in the future, though, as newer versions of QuickCheck provide all the functionality I'm using from that package. (We just need to update to a recent Stack package set.) And the only instances we need from quickcheck-instances are for the Text and UUID types, I believe, so if you prefer we can just steal those and eliminate that dependency too.

This commit includes a bunch of new QuickCheck tests of the Aeson instances; they're fairly limited in that they only test that we correctly decode the JSON we encode our types into, which doesn't tell us whether that JSON matches Coinbase's, but I've paired them with some unit tests that decode the example messages given in the API documentation. The idea is that we should collect some more messages ourselves from the live server, store them, and test them in the same way; then we're not dependent on getting lots of messages over the network every time for basic regression testing. I plan to implement similar tests for the REST API code in the future.

That said, pulling and testing the example messages from the docs has shown me that the docs leave a lot to be desired. Several message types are undocumented, and many of the examples are not actually what the API produces, and not even valid JSON. I've made a bunch of comments in the code about these issues and others, annotated with NOTE, TODO, or FIXME. If you search for those keywords, you should be able quickly to review most of the gotchas and questionable bits in this PR that we may need to discuss before merging.

And of course I've uncovered some other issues that I'll open in the days ahead.

dimitri-xyz commented 4 years ago

I cannot build this pull request. I get a few errors like:

haskell-coinbase-pro/src/Coinbase/Exchange/Types/Socket.hs:279:33: error: Not in scope: type constructor or class ‘BestPrice’

I cannot find the BestPrice type. Did you forget something to include something? Or did I miss something?

Sorry about the delay on this. This is good work.

ericpashman commented 4 years ago

Yikes, looks like I broke this while tidying up my Git history just prior to submitting the PR. Will fix and re-submit.

ericpashman commented 4 years ago

OK, this should work now. Sorry for the history-mangling. Some further notes:

  1. Four tests currently fail. These are all tests of REST API functionality that I haven't touched.
  2. Aside from adding a bunch of property tests of the WebSocket API's Aeson instances, I also added several more channels to the pre-existing tests of the WebSocket API that hit the server. This ends up pulling down a lot more data and typically speeds up the tests considerably.
  3. I included a stack.yaml.lock file in this commit. I hadn't meant to, but it's supposed to be under version control, so there it is.

Sorry again for the mess, and for the size of this PR.

ericpashman commented 4 years ago

Oh, I forgot to mention that this also reverts the parsing of the client_oid field of “received” message types to the code you first proposed in your fix https://github.com/dimitri-xyz/haskell-coinbase-pro/pull/4. My very helpful suggestion of a “more idiomatic” way to write it does not in fact work, as the left-side parse results in an empty value rather than a Nothing, and the (.:!) operator only handles the latter. As far as I can tell, using the applicative operator as you suggested is the only way to get the result we want.

dimitri-xyz commented 4 years ago

Could we rebase this PR on the latest head (70352fd)? It makes it much easier to review!

ericpashman commented 4 years ago

Sure, I don't think this PR touches any files with newer merged changes, so there shouldn't be any conflicts.

If you haven't done it before then, I'll rebase tomorrow when I'm at a computer with a local copy of the repo.

ericpashman commented 4 years ago

Oh, you might also want to search for TODO, FIXME and NOTE comments and review them for gotchas that we might want to deal with before committing. That said, though it's been too long for me to recall the details, I considered this PR ready to be merged when I submitted it.

ericpashman commented 4 years ago

Well, it’s not a completely opaque field. The docs say it must be a UUID, and the empty string is not a valid UUID, so that seems to me to be an implementation problem on Coinbase’s part. If the field were meant to be completely opaque, we would just parse it into Text and there would be no problem.

Anyway, I think we’re on the same page here. I’m working on making all the other discussed changes and will update the PR, maybe tonight, probably tomorrow.

On May 13, 2020, at 20:26, Dimitri DeFigueiredo notifications@github.com wrote:

 @dimitri-xyz commented on this pull request.

In src/Coinbase/Exchange/Types/Socket.hs:

  • | Error
  • { msgMessage :: Text
  • } I think punching "too big a hole" while parsing client_oid is different from this. It is an opaque field provided by customers themselves and I cannot point to any implementation problem from Coinbase. Having said that, I'll leave the decisions of what to do for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ericpashman commented 4 years ago

OK, agreed that we leave things as they are. I’ve made all the other changes discussed in this review, except that I still need to re-implement the boilerplate Quickcheck instances using generic-random. I’ll do that tomorrow and update the PR.

Sorry for dropping all of that on you. :P These lengthy comments are partly notes to self as I reacquaint myself with the codebase after months away, and with Aeson, etc. I’ve experienced some déjà-vu working on this, coming to realizations I’ve had before that, (a) I originally tried to do this generically in the first place but ran into these same roadblocks, and (b) that in my opinion, the right way to implement third-party APIs is almost always to mirror what the API produces as closely as possible in the Haskell types, leaving “improvements” to higher-level consumers of the API.

—— Please don't read the rest of this unless you have noting else to do this weekend. ——

Apology behind me, let me do a bit of a brain dump while my thoughts are fresh. Please don’t feel the need to respond to this unless you think you can correct some basic misunderstanding on my part; I’ll open specific issues later if I feel this still merits it.

There are several reasons why I don’t think it makes sense to “improve” the API in our types by trying to express invariants that the API itself doesn’t explicitly express. The API does not explicitly express these invariants. If an invariant isn't expressed in the JSON and isn’t expressed in the documentation, why are we inferring it? Diverging from the API by “improving” it makes the library harder to use by making it different. Even if you know the Coinbase Pro API well, you won’t know what the msgMarketBounds field is, because it does not exist in the Coinbase Pro API. Not even as a concept. This library just made it up. So now you have two APIs to learn. And as much as I’ve been shitting on Coinbase's documentation, it’s a lot better than this library’s. Writing, maintaining, and testing types and instances that mirror the API’s JSON is clean and easy, while writing, maintaining, and testing types that differ from the JSON rapidly adds lines of code and conceptual complexity. You don’t have to debug generic Aeson instances for types parallel to the JSON, while debugging hand-coded instances is as tedious and time-consuming as debugging in Haskell gets. I’m not convinced that the msgMarketBounds record field correctly expresses an API invariant on the permutations of “funds” and “size” fields. You seem to think otherwise, so I am prepared to realized I’m wrong, but I don’t see such an invariant expressed in the documentation, and as you noted it can’t be expressed in JSON. Further, as I wrote in the footnote to my previous comment, the invariant expressed by the msgMarketBounds is not what I think it obviously should be. I may well be wrong about that, but it seems to me that there should be a one-to-one correspondence between a market order’s “side” and its “bounds” (buy orders should have a “funds” field, sell orders should have a “size” field). Why should a single market order ever have both a “funds” (aka “cost”) and a “size” field? Note that one would not expect a parsing failure if the invariant expressed by msgMarketBounds is wrong, because in the case of what I think should happen, the Maybe Size inside a Right value of msgMarketBounds should always be Nothing. (That’s what I mean by “wrong”: the invariant is improperly expressed, if the preceding turns out to be correct, by allowing for a permutation that cannot happen.) One exception to (2)(1) that I see is if markets allow for negative prices. In that case, the invariant expressed in msgMarketBounds would still be wrong, as both size and funds would always be required in the case of a market sell; the case of a market buy would be unchanged. I don’t otherwise understand your point about a market order being limited by both “size” and “funds”. If I want to market sell 1 BTC in the BTC-USD market, how is my dollar balance relevant unless negative prices are allowed? And even if negative prices are allowed, how is my BTC balance relevant if I want to market buy $1000 worth of BTC? Another exception to (2)(1) that I see is if Coinbase calls “market orders” something which is not usually called that. It’s possible to implement an order type that is limited by both size and funds, it’s just not what’s usually called a market order, and it’s not a sensible order type that anybody would want to use. An order to buy $1000 worth of BTC at the market price, but not to buy more than one million BTC for those thousand dollars, even if it’s on offer, say, does nothing for you except to keep you from buying BTC at even cheaper prices than the prices you already bought at in the same order even though you have more cash left to buy with. An opposite example, in which the price is binding, is called a limit order. The only other exception to (2)(1) that I see is related to stop orders (or similar synthetic order types), for which a limitation by both size and funds does make sense. I haven’t looked at Coinbase’s implementation of stop orders, and I don’t know whether they show up with an “order_type” field of “market”. I would think they have a different order type, but Coinbase has surprised me before. I understand the distinction between Cost and Size, and I recognize that the units of the two quantities differ in the example at issue. That difference in units is why I wrote what I did in the footnote, that the asset of denomination should be embedded in the type. I don’t understand why the type was called Cost instead of Funds, diverging from the language used in the API and its documentation. In your example, “funds” seems precisely the right term, while “cost” seems like it could possibly mean “price”. In my opinion, the way Coinbase should have implemented this is by specifying a quantity (“size”) and denomination (“product”), i.e., how much of the asset on the other side of the market you’re willing to transact at the best available prices. “Make invalid states unrepresentable”, “parse, don’t validate.” Yes, and I’ve read that blog post and mainly agree with it. But I question its relevance here. We’re implementing bindings for a third-party API, not designing a program around one. Valid data here are the data the API returns, period. Yes, it’s good practice to enforce invariants at the edges of your program, but this is not your program. The types this API provides are not the types you should be using internally in your program; they’re just a Haskell representation of the data the API actually provides. Whatever invariants your program needs to enforce that Coinbase’s API doesn’t actually enforce should be enforced at the boundaries between this library and your program. “Parse, don’t validate” seems to me primarily to be summed up in the motto, “Write functions on the data you wish you had, not on the data you have”. This library deals only with data that you actually have, without reaching the level of what data you wish you had. Functors, applicatives, and monads exist primarily to deal with the problem of “data you wish you had”, so while I like the sentiment of “parse, don’t validate”, it’s not like we don’t have tools to deal with this problem down the line. Much of the above makes it seem like I don’t see value in expressing expressible invariants. My qualm is actually in expressing invariants that are not stated, and that therefore might not really exist or that we might express improperly. Do we really want to be in the business of inferring invariants empirically from the JSON messages we see coming over the wire? Aside from the fact that trying to do this makes the codebase a lot bigger and a lot more complicated, I feel like this sort of guessing game is not worth playing. Hope you didn’t read all of this. Have a good weekend!

On May 16, 2020, at 00:24, Dimitri DeFigueiredo <notifications@github.com mailto:notifications@github.com> wrote:

@dimitri-xyz commented on this pull request.

In src/Coinbase/Exchange/Types/Socket.hs https://github.com/dimitri-xyz/haskell-coinbase-pro/pull/10#discussion_r426120406:

  • toJSON L2SnapshotMsg{..} = object
  • [ "type" .= ("snapshot" :: Text)
  • , "product_id" .= msgProductId
  • , "asks" .= msgAsks
  • , "bids" .= msgBids
  • ]
  • toJSON L2UpdateMsg{..} = object
  • [ "type" .= ("l2update" :: Text)
  • , "time" .= msgTime
  • , "product_id" .= msgProductId
  • , "changes" .= msgChanges
  • ]
  • toJSON OpenMsg{..} = object
  • [ "type" .= ("open" :: Text)
  • , "sequence" .= msgSequence
  • , "time" .= msgTime
  • , "product_id" .= msgProductId
  • , "order_id" .= msgOrderId
  • , "side" .= msgSide
  • , "remaining_size" .= msgRemainingSize
  • , "price" .= msgPrice
  • ] Whoa! There's a lot to unpack on your comments. Here are my thoughts.

First and foremost, I am not as concerned about deriving the ToJSON instances as I am about deriving the Arbitrary instances. This is why I said this is maybe future work. I think it is worth improving, but I don't think it (the technical debt) has to be solved now, for these ToJSON instances.

You did a good job at pointing out and documenting the difficulties with deriving them automatically because of the multiple constructors (i.e. ReceivedLimitMsg and ReceivedMarketMsg).

I strongly disagree that there is no value in capturing the API invariants expressed in the type of the field msgMarketBounds :: Either Size (Maybe Size, Cost). This invariant does exist (it just can't be expressed in JSON), This is evidenced by the fact that I don't ever recall a parsing failure because of these (let me know if you see one). Furthermore, once this invariant is established, we don't have to keep checking this value again and again as would be the case with multiple Maybes. I think this is one of those "Parse, don't validate" https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ and "make invalid states unrepresentable" situations. Does this type make it more complicated to parse? For sure! But precisely because it enforces more strict invariants. (I have written other code that depends on this library and it was quite useful to be able to count on these invariants. The alternative is having to do this "type checking" again at a later stage. The invariant does show up either way.)

Although it may seem initially counter intuitive, allowing market orders to be bound by size, funds or both, is not a bug. This is exactly how market orders should be implemented in a "continuous limit orderbook" market. It is implicit that when limited by both, we should use the minimum (most restrictive) bound. This is one situation where Coinbase got it right and almost everybody else got it wrong.

The type Cost was created to distinguish from Price. The Cost of 2 (BTC) at a price of 10K (USD/BTC) is 20K (USD). Note the units. Funds seems to also match this idea, but seems a little bit less precise. I agree that costs could have the units encoded at the type level, for example, as a phantom type. As a requirement, I think that any implementation of this should also take care of the above calculation and changing units from Price to Cost.

Considering that after you looked at it carefully, it looks like a significant amount of work for an uncertain amount of gain. I suggest we leave it as is for now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dimitri-xyz/haskell-coinbase-pro/pull/10#discussion_r426120406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYGTGNESDLR7F6CMUNJ2PTRRYPSPANCNFSM4JNEZMYA.

dimitri-xyz commented 4 years ago

Just a couple thoughts.

The invariant in msgMarketBounds is documented in the API Spec for order placement. (Look for "market order parameters" there).

Here's an attempt to make the meaning of those parameters clearer.

All buy-side orders (limit or market) have a "funds" (i.e. cost) bound. Consider a market bid (buy order) in the BTC-USD market to buy 3 BTC. Although it is not explicitly specified, the matching engine must ensure the customer has enough funds to pay for the 3 bitcoins. How can this be done? Coinbase puts a hold on all the customers available USD funds and then "estimates" whether those funds will be sufficient to buy the 3 BTC. If the "risk management engine" (that makes the estimate) thinks those funds are insufficient, the order is immediately rejected ("insufficient funds"). If it thinks there's enough money to pay for the 3 BTC, the order goes to the matching engine. Exactly the same process happens for limit orders, but in that case it is easy to calculate how much USD to place on hold. Buying 3 BTC at $10K each will require at most $30K. This is the amount placed on hold. So, market bids will always have two bounds: one specified in "how many BTC to buy" (size) and another in "how much money to place on hold (funds)". Coinbase's API just allows customers to express those explicitly.

ericpashman commented 4 years ago

All changes made as discussed. Merging.

dimitri-xyz commented 4 years ago

Nice! :-)