JSMonk / hegel

An advanced static type checker
https://hegel.js.org
MIT License
2.09k stars 59 forks source link

refine type of array access to non-undefined given proper for-loop header #236

Open trusktr opened 4 years ago

trusktr commented 4 years ago

Maybe Hegel could check loop headers, and if the iteration variables are written properly, then it can narrow the types of array access to non-undefined.

For example, in the code

let nums: Array<number> = [0,1,2,3]

for (let i=0; i<nums.length; i++) {
  const n = nums[i] // must be a number
  useNum(n)
}

it will not go out of bounds, so there is no check needed like n != undefined && useNum(n).

Sorry for opening so many issues! There are lots of ideas. :)

JSMonk commented 4 years ago

No problem. On the contrary, thank you for so many issues. Actually, I described it in detail in docs why do we need this behavior. Thank you for the issues ^_^.

peter-leonov commented 4 years ago

Say, in my project which is soundly type checked by Hegel I'd create a user space type MyVector and allow only the operations not leading to a sparse array. What is not currently possible (AFAIK) in the user space is expressing the connection between i and the nums.length. On the other hand, the simple use of the for of loop might help avoiding the mess with i 🙃

Raynos commented 4 years ago
const numbers: Array<number> = [1, 2, 3];
let intSequence = "";
numbers.length = 5;
for (let i = 0; i < numbers.length; i++) {
  // TypeError: Cannot read property 'toFixed' of undefined
  intSequence += numbers[i].toFixed();
}

This example actually has the wrong behavior.

Assignment to numbers.length should change the type of numbers to Array<number | undefined>

Changing the length field is turning a normal array into a "holey array"

A for loop should "just work" TM without an if check in the body, if the array type is "refined" into the larger type of Array<T | undefined> upon assignment of length aka "continous array" => "Holey array" then you can still make for loops work.

Raynos commented 4 years ago
someArray.length = 0

You can make an exception that assigning the NumberLiteral 0 to .length does not convert it into a "holey array" since it truncates to an empty array of type Array<number>

raveclassic commented 4 years ago

@Raynos you never know where in the code this assignment may happen, it may happen not only in current scope. That's why we need that check

Raynos commented 4 years ago

This is correct. An array can be mutated in the body of a for loop which invalidates safe access.

I guess bounds checking is hard :( ( https://go101.org/article/bounds-check-elimination.html ).

vkurchatkin commented 4 years ago

There is no way to avoid this check, unless we go into the realm of dependent types

Raynos commented 4 years ago

Would the following be ok ?

for (let i=0; i<nums.length; i++) {
  const n = nums[i] // must be a number
  // Arbitary statements that do not reference the `nums` identifier.
}

Aka a loop from 0 to len and the only reference to nums in body of the loop is nums[i]

Any other method call on nums would invalidate type nums[i] is known to not be undefined.

I guess that does require lookahead in the type checker and a two pass check over the body of the loop

It's a real shame javascript does not have for (const value, index of nums) syntax -.-

Sometimes you need access to the index of the loop and doing for (const [index, value] of nums.entries()) allocates a temporary array and has destructuring overhead.

I guess the same behavior

peter-leonov commented 4 years ago

The bounds check is one issue, but there is another deeper one: the “holey” arrays. For a JS developer a value of type ‘Array’ can always have holes in it. I always expect hidden ‘undefined’s to be there while not ever checking this case in the my everyday programming life 🤦‍♂️😅

Raynos commented 4 years ago

Would it be useful to have two inbuilt types Array and SparseArray to represent this ?

peter-leonov commented 4 years ago

Would it be useful to have two inbuilt types Array and SparseArray to represent this ?

I also like the idea of having two distinct types here. Following the rule of the least surprise I'd say that the Array type should stay sparse by default and a more refined NonSparseArray (or Vector) can be a nice addition to the standard type library.

JSMonk commented 4 years ago

Sounds good. I think we can implement a safe Vector type which will give you the ability to use the Array as a fixed-length collection (like $Immutable<Array<unknown>>, but give you an option to set value in defined positions. What do you think?

Raynos commented 4 years ago

A fixed length array is rarely used in javascript

The amount of usage of const arr = new Array(MY_SIZE) vs const arr = [] shows that 99% of the time people use [] and let the JS engine autogrow the array upon insertion.

JSMonk commented 4 years ago

No. I talk about the next thing.

const vector: Vector<number, 10> = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
vector.pop(); // Error
vector.length = 4 // Error
vector[0] // number
vector[0] = 4; 
vector[11] // Error
vector[11] = 4; // Error
Raynos commented 4 years ago

I guess fixed size arrays are nice for constants but that only solves the loop issue for fixed size arrays.

How do you for loop over a dynamic sized array ? aka HttpAgent#freeSockets

Raynos commented 4 years ago

Also Vector reminds me people of std::Vector in C++ ( http://www.cplusplus.com/reference/vector/vector/ ). Might be confusing name for a fixed length array.

Vectors are sequence containers representing arrays that can change in size.

Probably not the best name. The only good example of a fixed size array in JS is ArrayBuffer ( https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer ).

trusktr commented 4 years ago

Following the rule of the least surprise I'd say that the Array type should stay sparse by default

How many people actually use sparse arrays? I've never used one, and have never encountered it, and have never had a runtime error from it (maybe I've gotten lucky).

Personally I'd prefer to have non-sparse Array and make array.length = anythingOtherThanZero a type error. We don't have to hold JS user hands, they can learn the difference. After all, we're not changing JS behavior, only restricting how we use it; that's what Hegel's goal is so far.

Vector type which will give you the ability to use the Array as a fixed-length collection

I'm sure some people would want to use it, those into immutable libs for example. It just needs a different name than Vector to not confuse with std::vector which is explicitly meant to be a variable-length list.

It would certainly make static analysis of the loop easier in those cases.

I think most people would like a solution that works with variable-length arrays.

It is in how we modify the length that things can go wrong. If we use array.push(n) where then n is type checked, there's no problem if it increases the length... outside of a loop.

The problem is what about modification of the array length (f.e. splicing out items) inside a loop? Fixed-length arrays would make it easy to know this doesn't happen.

For variable-length arrays, maybe the type checker can detect whether or not length modification happens in the loop, then A) if confidently detected that not, require no refinement, otherwise B) require refinement?

Example:

const a = [1,2,3]

a.push(4) // ok

a.length = 50 // would be a type error

let n2: number = 0

for (const n of a) {
  n2 = n // ok
  console.log(n2)
}

for (const n of a) {
  // reduces the confidence of a.length not being modified to less than 100%
  someFunction(a) 

  n2 = n // type error, number | undefined is not assignable to number
  console.log(n2)
}

for (const n of a) {
  // explicitly taints confidence of a.length not being modified
  a.splice(...)

  n2 = n // type error, number | undefined is not assignable to number
  console.log(n2)
}
Raynos commented 4 years ago

const n is a const variable. It’s type is literally constant due to assignment being an error.

It’s only array index accessor that can have an out of bounds access returning undefined or return a hole

trusktr commented 4 years ago

In for-of loops, const n is recreated on each iteration, and it can contain a hole, therefore the type of const n in for (const n of a) is number | undefined. So because n2 is type number, we can not assign n to it in the second and third for-of examples due to certainty of a not containing holes being lost (number | undefined can not be assigned to number in that case).

In the first for-of loop, if we cut out the other stuff,

const a = [1,2,3]
a.push(4) // ok
let n2: number = 0

for (const n of a) {
  n2 = n // ok
  console.log(n2)
}

we know with certainty that there are no holes in the array, so in that case const n is number.

const n is the result of what is effectively the same thing as an index access. You can see this logs undefined in your console:

let a = [1,,3,4]
a.length = 5
for (const n of a) console.log(n) // logs: 1, undefined, 3, 4, undefined
trusktr commented 4 years ago

What I'm saying is, it is possible to statically analyze the following:

const a = [1,2,3]
a.push(4)
let n2: number = 0

for (const n of a) {
  n2 = n // ok, no error
}
const a = [1,2,3]
a.length = 50
let n2: number = 0

for (const n of a) {
  n2 = n // type error, number | undefined is not assignable to number
}
const a = [1,2,3]
a.length = 2
let n2: number = 0

for (const n of a) {
  n2 = n // ok, no error
}
const a = getAnArrayFromSomewhere()
a.length = 2
let n2: number = 0

for (const n of a) {
  n2 = n // type error, number | undefined is not assignable to number
}
const a = [1,2,3]
let n2: number = 0

for (const n of a) {
  n2 = n // ok, no error
}
const a = [1,2,3]
let n2: number = 0

for (const n of a) {
  someFunction(a)
  n2 = n // type error, number | undefined is not assignable to number
}

In some cases, if we statically analyze someFunction or getAnArrayFromSomewhere, then we may know that the array must not have holes. For example:

function someFunction(a: Array<number>) {
  a.push(42)
}

const a = [1,2,3]
let n2: number = 0

for (const n of a) {
  someFunction(a)
  n2 = n // ok, no type error
}

vs

function someFunction(a: Array<number>) {
  a[Math.floor(Math.random() * 100)] = 42
}

const a = [1,2,3]
let n2: number = 0

for (const n of a) {
  someFunction(a)
  n2 = n // type error, number | undefined is not assignable to number
}

or

function getAnArrayFromSomewhere() { return [1,2,3] }
const a = getAnArrayFromSomewhere()
a.length = 2
let n2: number = 0

for (const n of a) {
  n2 = n // ok, no type error
}

or

function getAnArrayFromSomewhere() { return [1,2,3] }
const a = getAnArrayFromSomewhere()
a.length = 50
let n2: number = 0

for (const n of a) {
  n2 = n // type error, number | undefined is not assignable to number
}

We can see that in all those cases, we know with certainty whether an iteration will get holes or not.

Raynos commented 4 years ago

I didn’t know a for of loop includes undefined for a holey array.

That means both for and for of loops both have x or undefined behavior.

Good catch

trusktr commented 4 years ago

I had no idea forEach skipped holes. :laughing: That's neat! After all these years, it never mattered (or I wasn't aware that it mattered).