why this instead of dts-gen?

David-Kunz commented 2 years ago

I haven't looked into it in much detail but dts-gen doesn't seem to be usable for multiple invocations. You need a proper merging algorithm to combine different runtime objects to a single type definition, consider this example:

myFun({a: 1})
myFun({a: 2})
myFun({a: 3, b: 5})

It would be inefficient to derive

{a: number} | {a: number, b: number}

.

It should be merged to

{a: number, b?: number}

If I'm not mistaken, dts-gen can't do that.

StoneCypher commented 2 years ago

It seems like you might be trying to construct sum types from calls. I don't understand how you could tell the difference between a supertype and an alternation that way.

Please consider the following code:

type element = 'fire' | 'water' | 'earth' | 'air' | 'metal';

type dragon = { species: 'dragon', breath_weapon: element };
type wizard = { species: 'human', spell: element };

type enemy = dragon | wizard;

function attack(target: enemy) { ... };

If I call that manually twice, it seems like I'd end up with a call signature that'd take an object containing both a breath_weapon and a spell, which is never correct

You need a proper merging algorithm

What typescript does here is not to merge. Using a merging algorithm will yield different results than what typescript's type system would do.

David-Kunz commented 2 years ago

That is correct @StoneCypher and a known limitation.

I also thought about using a clustering algorithm to detect those combinations, but I'll leave that as a future implementation.

In the end it's a statistical problem. If your function is only called twice with

{ species: string, breath_weapon: string)

and

{ species: string, spell: string)

then one cannot know if the original type is

{ species: string, breath_weapon: string } | { species: string, spell: string }

or

{ species: string, breath_weapon?: string, spell?: string }

Keep in mind that "aggressive union'ing" would drastically blow up the generated types and at some point the ~~TypeScript compiler~~ LSP server couldn't provide code completion anymore.

Hence I chose the second variant.

StoneCypher commented 2 years ago

also thought about using a clustering algorithm to detect those combinations

sorry, what? this isn't in any way related to clustering

In the end it's a statistical problem

No, it's not. There's a well defined right and wrong here.

then one cannot know if the original type is

One can, in fact.

Keep in mind that "aggressive union'ing" would drastically blow up the generated types

In addition, and possibly more importantly, it produces radically incorrect answers.

I feel like you may have lost track of that the purpose of a type system is to govern what right and wrong is, and like you're attempting to use guesswork to support a correctness system

at some point the TypeScript compiler couldn't provide code completion anymore.

Under no circumstances does the typescript compiler ever provide completion.

LSP is ready for Windows, and can happily support billions of concurrent types.

David-Kunz commented 2 years ago

One can, in fact.

No, one cannot.

Consider this example:

type x = { a: string, b?: string } // (1)

This information is not available in JavaScript. The invocations would be

myFun({ a: 'x' })
myFun({ a: 'y', b: 'z' })

Now solely based on this information, you won't be able to derive the type (1). It could be

{ a: string, b?: string}

or

{ a: string } | { a: string, b: string }

or even

{ a: string, b?: string, z?: string}

You cannot possibly know.

In addition, and possibly more importantly, it produces radically incorrect answers.

Yes, the answers are 'incorrect' but good enough to provide reasonable type hints.

I feel like you may have lost track of that the purpose of a type system is to govern what right and wrong is, and like you're attempting to use guesswork to support a correctness system

Yes, it's guess work. The same guess work as debugging the tests one by one and writing it down by hand. It's mathematically impossible to derive the correct type, see my previous example.

LSP is ready for Windows, and can happily support billions of concurrent types.

I tried the 'union approach' first, but the LSP would just fall back to any. I can look into this again, maybe it was based on a different error.

Note: I meant the LSP server before, I adjusted my comment.

StoneCypher commented 2 years ago

It's mathematically impossible to derive the correct type

If you're math oriented, please look into horn clauses, hindley-milner type inference, and robinson unification, all of which are available as tools here already.

At any rate, you appear to have your mind made up. I'll move along.

David-Kunz commented 2 years ago

If you're math oriented, please look into horn clauses, hindley-milner type inference, and robinson unification, all of which are available as tools here already.

Thanks for the pointers, but as I said, this problem here is of statistical nature: Function invocations will almost never be able to paint the complete picture, it's only a subset of all combinations.

If you call

myFun({ a: 'x', d: 9 })
myFun({ a: 'y', c: 'z' })

and assume that these are all possible calls, then yes, the true type is

{ x: string, d: number} | { a: string, c: string }

.

But if your API should also allow

myFun({ a: 'x', d: 9, c: 'z' })

(it just happens that you didn't have a test for that), then the true type is

{ a: string, c?: string, d?: number }

.

Now the question is: Which type is more likely? That's hard to answer, even in that simple example.

The problem even gets worse if you have unit tests which provide an incomplete set of parameters, you know, just enough to test the relevant parts.

I hope I could explain the problem a bit more and hope you can agree that even the most sophisticated type inference system won't help in this case.

At any rate, you appear to have your mind made up. I'll move along.

Please don't get this the wrong way, I value your feedback and welcome the discussion!

StoneCypher commented 2 years ago

I'm sorry, but no, this is an attempt to guess when guessing is not necessary, while pretending that guess has a mathematical or statistical basis when it's just a sum type.

Sum types are hard wrong in context, but are being represented as a best effort estimate.

There's no need to keep explaining. This is well understood. I'm trying to exit politely because I tried to tell you that better options already exist, and I wasn't heard.

David-Kunz commented 2 years ago

You were heard and your suggestion was actually my first solution to this problem, so I didn't reject it out of ignorance but because I saw that in practice it doesn't work very well in this particular use case.

If you have an object with many optional parameters, and that happens fairly often in real projects, you're not able to test all combinations.

If you represent that as a union type, the LSP won't give you good code completion. You will only see some parameters when you restrict others (e.g. using if statements).

Let's say the property a only exists if b is of type string. Then invoking code completion on that object will only give you a when b is set to a string. That is hardly what you want. The type is too restricted.

It's similar to the problem of over fitting in machine learning. Sure, for that exact data set, it would be your best mathematical estimate. But for real data, over fitting reduces the quality of your predictions a lot.

It's the same here. Your tests are not complete in general. Missing combinations of optional parameters reduce the type quality in the same way over fitting reduces the quality of predictions.

It is a statistical problem. And the uncertainty comes from incomplete tests.

I chose the approach to derive the most simple possible types, no unions with hundreds of variants. I understand that I loose the information about conditional relations (if b then a), but that's the lesser evil.

Anyways, thanks for your time and interest in this topic. You are not unheard and I understand your proposal.

David-Kunz / derive-type

why this instead of dts-gen? #2