MarcusRainbow / QuantMath

Financial maths library for risk-neutral pricing and risk
MIT License
371 stars 43 forks source link

Interested in collaborating #48

Open jonathanstrong opened 5 years ago

jonathanstrong commented 5 years ago

Hello,

Earlier this evening I randomly came across your post on users.rust-lang.org from several months ago looking for feedback on this library. I wasn't previously aware of the project and was impressed by its breadth and the amount of work that's already gone into it. I run a trading operation on a rust codebase so this is definitely of major interest to me! I'd be interested in collaborating and wanted to send some initial thoughts from a couple hours with the code.

You mentioned in the post you were new to rust, so I included various recommendations from my own experience. Perhaps some of it will be obvious or things you already know - but I thought it couldn't hurt. Like you, I see rust as an incredible tool for this application, owing to the great performance, powerful type system, modern packaging system, etc. But there are definitely still bumps along the way during the learning curve.

In any event, here are various thoughts I jotted down while diving into the project for the first time:

Hope these initial comments are helpful to you. I'm looking forward to delving further into this. I'd be interested to hear what the current status on the project is, and if you have any ideas for areas where collaboration would be especially useful.

Jonathan

MarcusRainbow commented 5 years ago

Hi Jonathan, Wow! This depth of feedback is what I was hoping for when I put the project on GitHub. Thanks very much. Here are a few responses to your issues:

Examples: Ideally what I wanted for examples was real world examples. Unfortunately, I do not have easy access to real market data. If you do, that would really help the project. Until I have some decent examples, the place I'd recommend looking at first for most users is the very top level tests -- facade/c_interface.rs. At present, the facade is all defined in terms of JSON inputs. Moving forward, I think there should be lower-level ways of getting data into QuantMath, but this does at least work and allow you to do pricing. As you are interested in interfacing from Rust, I'd start at the level below the facade -- pricers/selfpricer.rs and pricers/montecarlo.rs.

Numeric types: I have always used f64 or its equivalent in other languages for financial maths. I can see that this gives inaccurate results, but in practice the input data is not that great anyway. For some applications such as accrual-valued equity swaps and settlement, I can see that you may need greater accuracy. My gut feel would be to write templated code for functions I knew might need decimal accuracy, but to avoid it for most of the library. I don't think anybody is going to want risk calculated more accurately than f64.

Date/time: Please feel free to disagree, but my experience is that finance is either interested in nanosecond accuracy for high-frequency trading, or else it is good enough to hit the right date. Date-time brings a host of issues, like what is the settlement period for different times of day, do you need to handle intraday discounting, or how you handle timezones and changes to daylight-saving time. For every purpose I've experienced in finance, it's good enough to have a date plus some indication of how far you are through the trading day, such as an enum (open/close) or a day fraction -- volatility time, which may not be a linear function of real clock time.

Errors as strings: I've always liked strings as errors in finance for the reason that you seldom want to recover from an error -- you just want to report as much detail as possible to the user and then abort that calculation. (I once worked on a library that tried all sorts of ways of getting a vol, and if all else failed, it used 30%! -- I don't want that sort of philosophy in QuantMath.) However, I take what you are saying about the cost and awkwardness of strings. Maybe a way forward would be to use lightweight error types for the low-level modules, and keep string errors for the higher levels, where we need to be able to report a lot of detail.

statrs error: Your analysis is exactly correct here! Feel free to make the change you suggest.

The failing test: I only have a couple of machines here that I can test the code on -- it does not surprise me that different architectures would give slightly different results. The difference we are seeing here is a relative error of about 1e-8, even though I was asking Brent for an accuracy of 1e-10. Maybe it is just differences between the trig functions on my and your machines (see http://notabs.org/fpuaccuracy/)

Contributions: Pretty much anything would be welcome, from code tidyup through major chunks of new functionality. What sort of trading are you doing, and what are your requirements from a pricing library?

Marcus

jonathanstrong commented 5 years ago

I'm glad you found it helpful! To give you a bit more background about myself, I trade on the crypto markets, at relatively "high frequency" (the infrastructure is very young, it's milliseconds that matter still, not nanoseconds). Although I studied programming, I worked as a political reporter after school; my interest in trading came from an experience of using machine learning to forecast the outcomes of congressional votes, which we hoped to sell to hedge funds. (The forecasts were perfect, the sales less so -- it's still hard to trade an informational edge like that, we found).

Anyway, that's all to say it's less that I know exactly what I want to use here than I am very excited by the wealth of knowledge the code contains and all that I can learn from it. For instance, I had never encountered the term "bump" before yesterday, which I gather means to update calculations incrementally on receiving market new data. It's also to point out that my reference point of latency-sensitive code that runs 24/7 (no "trading day") and must never crash is pretty different from running a daily analysis - so my performance concerns may be overzealous.

I collect lots and lots of market data but I doubt it's the kind needed here. However, I found the functions in risk::marketdata::tests that generate dummy data so that should be enough to craft some examples out of.

One major thing I noticed from my further review of the code was the extensive use of single-use reference count types (RcRateCurve, RcDividendStream, etc). Also from the use of several serde alternatives I gather there was some friction with std::rc::Rc and serializing/deserializing the data.

On one previous occasion, I remember facing a similar problem that I solved by enabling the "rc" feature in serde (it's listed in the Cargo.toml. I believe it allows Rc<T> to be serialized as T, although it's been a while.

In general, it's far preferable to use a generic Rc type that wraps the underlying types, e.g. Rc<RateCurve> instead of RcRateCurve, as a RateCurve type really has no business mucking about with reference counting, and the one-off types create a lot of noise in trying to read and understand the code.

Can you explain what problems you faced that led you to these design choices? I have in mind several occasions where I ended up doing similar things simply because it seemed to be the only way to accomplish what I wanted the code to do. I gather that the intended use is to be able to load the underlying data from file, so serialization is important. Was the use of reference counting types primarily for convenience, or necessity (cyclic data structures, etc.)?

On a related front, many of the reference counted types use Arc, the thread-safe reference counted type that uses atomics under the hood. In my experience, Arc is surprisingly expensive - the type has to be quite large for simply cloning it to be slower. Also, I didn't find any multi-threaded code in the library itself. But the scope of the data here also matters. Broadly speaking, in a serious use of this library, how many "things" are we talking about? Thousands? Single-digit millions? At hundreds of millions the tab really starts to rack up from Arc, String, etc., but maybe that's way beyond what's plausible (in my high freq world the data piles up at a staggering pace).

I've forked the library and begun some work around the edges, I plan on submitting various pull requests here and there coming forward. First significant one will be to construct an example based on the monte carlo tests. Thanks for making this code public! The scope of the project is quite amazing and I'm very pleased I found it.