Open gadicc opened 2 months ago
Quoting @eddie-atkinson from https://github.com/gadicc/node-yahoo-finance2/issues/719#issuecomment-2076253522 (replying to me):
The flow is actually: a) typescript interfaces for all the APIs, b) typescript interfaces -> json schema, c) ajv to validate input using the json schema.
Given that this is the current workflow it feels like cutting out JSON Schema could make sense as a way of simplifying development as the source of truth is Typescript.
Lastly, AJV was actually selected (at the time?) due to being significantly faster (by a large margin) than all other solutions, which is handy considering that we have 1000's of tests
That's a fair call out that I didn't consider. Looking at the benchmarking done here it seems like
zod
has pretty poor performance characteristics compared toajv
. Interestingly the top 3 in the rankings all use Typescript types as their source of truth, though admittedly not "pure" Typescript (they have annotations you need to add). I did a little survey of the tools that ranked higher thanajv
on that benchmark and I'm interested in your view. From the perspective of bundle size, maturity, and weekly downloads@sinclair/typebox
seems like a promising option. Especially given that its perf in AOT and JIT mode exceedajv
. It also appears to support CF workers.
(source)
Name | Stars | Weekly Downloads | Date of first commit | Minified Bundle Size (as reported by bundlephobia) |
---|---|---|---|---|
ts-runtime-checks | 228 | 146 | 2022-07-22 | 58 kB |
@sinclair/typebox | 4200 | 24,239,331 | 2017-04-07 | 50 kB |
typia | 4045 | 24,148 | 2022-04-19 | 65.4 kB |
spectypes | 91 | 31 | 2022-04-29 | 1.7 kB |
suretype | 490 | 17,553 | 2023-02-26 | 139 kB |
ts-json-validator | 341 | 29,848 | 2019-10-31 | 118.8 kB |
ajv | 13,378 | 81,611,055 | 2015-05-31 | 119.6 kB |
But this would be a huuuge job, and I also think, probably worthy of a communal debate with enough time to elicit responses.
Is there a particular venue where you'd like to have that discussion? Here in this thread? GitHub discussions? I appreciate that this thread is, at the moment at least, a bit too hidden a way to have a discussion of this magnitude.
I would probably start off by looking for something that converts typescript interfaces to zod code, as a starting point... as there are a loooot of them
Yes and no, as shown by my highly scientific survey here:
The real questions for me would be less the leg work of converting all the interfaces and more so how we review the changes.
Keen to hear your thoughts and happy to put together a micro benchmark of some of the more gnarly interfaces to test feasibility and performance on any libraries of interest.
Firstly, @eddie-atkinson, thanks so much both for raising this issue but most especially for your preliminary research and helpful discussion.
Wow, I'd never heard of @sinclair/typebox
before. It looks super interesting. I was actually a little excited thinking about zod
since it's become so ubiquitous but I did suspect that performance would be an issue, since it's designed for much more complex cases than what we need (predominantly, just making sure runtime values match the expected typescript interfaces, which seems to be @sinclair/typebox
's exact use-case). Excited to see how well it could work for us. If anyone else has other suggestions, let us know, otherwise I think this is a good place to start (and thanks again to Eddie for bringing it to our attention).
We will need coercion for some cases and we'll need to think at what level to best handle that. More details in that file but in short:
YahooFinanceNumber: 123123 | { raw: 123123, formatted: "123,13" } | null?
YahooFinanceDate: Date | { raw: <unixEpoch>, /* other fields? */ } | StrDate | null?
DateInMs
TwoNumberRange: { low: 1, high: 10 }
The real questions for me would be less the leg work of converting all the interfaces and more so how we review the changes.
Very true. It's a pity most of our tests are just "validation passes", but we do have a bunch that test for expected failures too. Having said that, the main thing is just that types match... so probably enough to just check that the old typescript interface and the newly inferred one match (which should be pretty easy).
Thanks again for your efforts here and yeah sure, if you feel like playing around a bit and reporting back that would be absolutely amazing! :pray: A first step might be making sure we'll be able to do the coersion, but I feel like we should be able to handle that fairly easily at the library level as long as we can expect the field type.
cc:
if you feel like playing around a bit and reporting back that would be absolutely amazing!
I'll have a poke around this week and see if I can make coercion work 👍
Amazing! You're the best :clap: Thanks again :grin:
Hey @gadicc
I've had a bit of a play and put up a draft PR: https://github.com/gadicc/node-yahoo-finance2/pull/772
It seems like typebox can support the coercion logic that we currently have and is reasonably flexible. Nowhere in the league of Zod, but as the maintainer notes on several occasions this is why Zod's performance is not as good.
More details in the PR, but the TL;DR is I have created utility types in Typebox for YahooFinanceNumber
, YahooFinanceDate
, DateInMs
and TwoNumberRange
and added unit tests following the mold of the existing validateAndCoerceTypes.spec.ts
.
Still need to add my own version of validateAndParse
and test it for sensible error formatting. I'd also like to independently benchmark ajv
vs typebox
for a couple of choice interfaces to make sure there hasn't been a huge performance regression.
You, Sir, are a legend! :ok_hand:
My comments upcoming on the PR... but in short, I really believe this will be a massive improvement. I've had something like this in the back of my mind for quite some time, but also no time to actually look at it seriously. So thanks once again for raising this (ironically as an ancillary issue to getting this running outside of node), for taking it on, conducting research, and getting coding. Open source projects can become quite draining, and then things like this happen, which is honestly just... wonderful. So thank you :)
You, Sir, are a legend! 👌
Straight back at you, not sure I would have the stamina to run an open source project of this size for as long as you have, so kudos to you.
So thanks once again for raising this (ironically as an ancillary issue to getting this running outside of node)
You have no idea how much of a yak shave this will be for me. I started out just wanting to build a stock tracker for pricing the stock options I get for work properly 🙃. Great opportunity to give back to the community though so I'm more than happy to do it
Open source projects can become quite draining, and then things like this happen, which is honestly just... wonderful. So thank you :)
❤️
I've added some more commits to the PR whenever you get a spare minute to have a gander. I'll probably hack a bit more this week / weekend
not sure I would have the stamina to run an open source project of this size for as long as you have, so kudos to you.
thanks :pray: :heart: the struggle is real. i'd say I unfortunately don't manage to get to a large chunk of issues, but I've always tried to prioritise PRs, issues that could lead to PRs, schema fixes and anytime changes on Yahoo end broke functionality. So while I haven't managed to maintain things at the level I'd hoped, the project continues to chug along :sweat_smile: :sweat_smile:
You have no idea how much of a yak shave this will be for me.
Haha, been down that rabbit hole way too many times. But yeah, it's great when entire communities benefit from it. So thanks so much again! As for a stock tracker, not sure if you already saw Ghostfolio, probably the biggest single usage of this project - could save you some time if you're willing to go a servered (non-serverless, non-edge) approach. I have my own small personal project - not open source - that's serverless NextJS though, so do understand stack preferences :blush:
TL;DR; We are considering changing the tools and flow we use for validation. This won't have any effect on how to use the library, but will affect how to develop for and debug the library.
Current flow: 1) create interfaces in typescript, 2) convert to json schema, 3) use ajv for runtime validation. Pros: (+) it's all setup and works, (+) it's fast. Cons: (-) need to constantly regenerate the schema on changes, (-) ajv library is hard to debug, doesn't work in non-node environments.
Proposed: 1) create schema using relevant library, 2) infer to typescript, 3) use same lib. Pros: (+) single source of truth, no need to constantly regenerate schema, (+) work in non-node envs. Cons: (-) a lot of work! Mixed: (+/-) Test speed could get better/worse based on library choice and implementation.
This discussion arose from a great exchange I had with @eddie-atkinson in https://github.com/gadicc/node-yahoo-finance2/issues/719. Readers wanting more background on the issue may want to start there, however, I've summarized the most important points above and am quoting his most recent post below for us to continue the conversation from there. Massive thanks again to Eddie both for raising the issue and for his contributions and research to the discussion.
I stress again that this change will have no visible impact on documented use of yahoo-finance2 but will impact developers and debuggers. I expect after a full implementation the impact will be a more positive dev experience. However, end-users who have written any custom flows (that were never documented or proposed) against the existing schema would need to rewrite their code -- I'm currently not aware of anyone who has done this.