gadicc / node-yahoo-finance2

Unofficial API for Yahoo Finance
https://www.npmjs.com/package/yahoo-finance2
MIT License
353 stars 58 forks source link

[Discussion] Changing the validation flow #770

Open gadicc opened 2 months ago

gadicc commented 2 months ago

TL;DR; We are considering changing the tools and flow we use for validation. This won't have any effect on how to use the library, but will affect how to develop for and debug the library.

This discussion arose from a great exchange I had with @eddie-atkinson in https://github.com/gadicc/node-yahoo-finance2/issues/719. Readers wanting more background on the issue may want to start there, however, I've summarized the most important points above and am quoting his most recent post below for us to continue the conversation from there. Massive thanks again to Eddie both for raising the issue and for his contributions and research to the discussion.

I stress again that this change will have no visible impact on documented use of yahoo-finance2 but will impact developers and debuggers. I expect after a full implementation the impact will be a more positive dev experience. However, end-users who have written any custom flows (that were never documented or proposed) against the existing schema would need to rewrite their code -- I'm currently not aware of anyone who has done this.

gadicc commented 2 months ago

Quoting @eddie-atkinson from https://github.com/gadicc/node-yahoo-finance2/issues/719#issuecomment-2076253522 (replying to me):

The flow is actually: a) typescript interfaces for all the APIs, b) typescript interfaces -> json schema, c) ajv to validate input using the json schema.

Given that this is the current workflow it feels like cutting out JSON Schema could make sense as a way of simplifying development as the source of truth is Typescript.

Lastly, AJV was actually selected (at the time?) due to being significantly faster (by a large margin) than all other solutions, which is handy considering that we have 1000's of tests

That's a fair call out that I didn't consider. Looking at the benchmarking done here it seems like zod has pretty poor performance characteristics compared to ajv. Interestingly the top 3 in the rankings all use Typescript types as their source of truth, though admittedly not "pure" Typescript (they have annotations you need to add). I did a little survey of the tools that ranked higher than ajv on that benchmark and I'm interested in your view. From the perspective of bundle size, maturity, and weekly downloads @sinclair/typebox seems like a promising option. Especially given that its perf in AOT and JIT mode exceed ajv. It also appears to support CF workers.

image (source)

Name Stars Weekly Downloads Date of first commit Minified Bundle Size (as reported by bundlephobia)
ts-runtime-checks 228 146 2022-07-22 58 kB
@sinclair/typebox 4200 24,239,331 2017-04-07 50 kB
typia 4045 24,148 2022-04-19 65.4 kB
spectypes 91 31 2022-04-29 1.7 kB
suretype 490 17,553 2023-02-26 139 kB
ts-json-validator 341 29,848 2019-10-31 118.8 kB
ajv 13,378 81,611,055 2015-05-31 119.6 kB

But this would be a huuuge job, and I also think, probably worthy of a communal debate with enough time to elicit responses.

Is there a particular venue where you'd like to have that discussion? Here in this thread? GitHub discussions? I appreciate that this thread is, at the moment at least, a bit too hidden a way to have a discussion of this magnitude.

I would probably start off by looking for something that converts typescript interfaces to zod code, as a starting point... as there are a loooot of them

Yes and no, as shown by my highly scientific survey here:

image

The real questions for me would be less the leg work of converting all the interfaces and more so how we review the changes.

Keen to hear your thoughts and happy to put together a micro benchmark of some of the more gnarly interfaces to test feasibility and performance on any libraries of interest.

gadicc commented 2 months ago

Firstly, @eddie-atkinson, thanks so much both for raising this issue but most especially for your preliminary research and helpful discussion.

Wow, I'd never heard of @sinclair/typebox before. It looks super interesting. I was actually a little excited thinking about zod since it's become so ubiquitous but I did suspect that performance would be an issue, since it's designed for much more complex cases than what we need (predominantly, just making sure runtime values match the expected typescript interfaces, which seems to be @sinclair/typebox's exact use-case). Excited to see how well it could work for us. If anyone else has other suggestions, let us know, otherwise I think this is a good place to start (and thanks again to Eddie for bringing it to our attention).

We will need coercion for some cases and we'll need to think at what level to best handle that. More details in that file but in short:

YahooFinanceNumber: 123123 | { raw: 123123, formatted: "123,13" } | null?
YahooFinanceDate: Date | { raw: <unixEpoch>, /* other fields? */ } | StrDate | null?
DateInMs
TwoNumberRange: { low: 1, high: 10 }

The real questions for me would be less the leg work of converting all the interfaces and more so how we review the changes.

Very true. It's a pity most of our tests are just "validation passes", but we do have a bunch that test for expected failures too. Having said that, the main thing is just that types match... so probably enough to just check that the old typescript interface and the newly inferred one match (which should be pretty easy).

Thanks again for your efforts here and yeah sure, if you feel like playing around a bit and reporting back that would be absolutely amazing! :pray: A first step might be making sure we'll be able to do the coersion, but I feel like we should be able to handle that fairly easily at the library level as long as we can expect the field type.

gadicc commented 2 months ago

cc:

eddie-atkinson commented 2 months ago

if you feel like playing around a bit and reporting back that would be absolutely amazing!

I'll have a poke around this week and see if I can make coercion work 👍

gadicc commented 2 months ago

Amazing! You're the best :clap: Thanks again :grin:

eddie-atkinson commented 1 month ago

Hey @gadicc

I've had a bit of a play and put up a draft PR: https://github.com/gadicc/node-yahoo-finance2/pull/772

It seems like typebox can support the coercion logic that we currently have and is reasonably flexible. Nowhere in the league of Zod, but as the maintainer notes on several occasions this is why Zod's performance is not as good.

More details in the PR, but the TL;DR is I have created utility types in Typebox for YahooFinanceNumber, YahooFinanceDate, DateInMs and TwoNumberRange and added unit tests following the mold of the existing validateAndCoerceTypes.spec.ts.

Still need to add my own version of validateAndParse and test it for sensible error formatting. I'd also like to independently benchmark ajv vs typebox for a couple of choice interfaces to make sure there hasn't been a huge performance regression.

gadicc commented 1 month ago

You, Sir, are a legend! :ok_hand:

My comments upcoming on the PR... but in short, I really believe this will be a massive improvement. I've had something like this in the back of my mind for quite some time, but also no time to actually look at it seriously. So thanks once again for raising this (ironically as an ancillary issue to getting this running outside of node), for taking it on, conducting research, and getting coding. Open source projects can become quite draining, and then things like this happen, which is honestly just... wonderful. So thank you :)

eddie-atkinson commented 1 month ago

You, Sir, are a legend! 👌

Straight back at you, not sure I would have the stamina to run an open source project of this size for as long as you have, so kudos to you.

So thanks once again for raising this (ironically as an ancillary issue to getting this running outside of node)

You have no idea how much of a yak shave this will be for me. I started out just wanting to build a stock tracker for pricing the stock options I get for work properly 🙃. Great opportunity to give back to the community though so I'm more than happy to do it

Open source projects can become quite draining, and then things like this happen, which is honestly just... wonderful. So thank you :)

❤️

I've added some more commits to the PR whenever you get a spare minute to have a gander. I'll probably hack a bit more this week / weekend

gadicc commented 1 month ago

not sure I would have the stamina to run an open source project of this size for as long as you have, so kudos to you.

thanks :pray: :heart: the struggle is real. i'd say I unfortunately don't manage to get to a large chunk of issues, but I've always tried to prioritise PRs, issues that could lead to PRs, schema fixes and anytime changes on Yahoo end broke functionality. So while I haven't managed to maintain things at the level I'd hoped, the project continues to chug along :sweat_smile: :sweat_smile:

You have no idea how much of a yak shave this will be for me.

Haha, been down that rabbit hole way too many times. But yeah, it's great when entire communities benefit from it. So thanks so much again! As for a stock tracker, not sure if you already saw Ghostfolio, probably the biggest single usage of this project - could save you some time if you're willing to go a servered (non-serverless, non-edge) approach. I have my own small personal project - not open source - that's serverless NextJS though, so do understand stack preferences :blush: