Sub-Types (Type Safety v1.1)

TheDan64 commented 6 years ago

WARNING: Brain dump ahead!

Today, build_int_add looks approximately like build_int_add(&self, left: &IntValue, right: &IntValue) -> IntValue. This is great, because it'll stop you from trying to add a FloatValue and an IntValue, for instance which would not work in LLVM. But I wonder if we can take type checking a step further! What happens when left is a u8 and right is a u16? Needs to be verified, but I believe this is also a LLVM error because it doesn't make much sense. How would adding two values of different sizes work (without casting)?

Therefore, I wonder if we can add sub-type annotations using generics (here-on known as sub-types). For example, IntType<u32> and IntValue<u32> so that you could express something like: build_int_add<T>(&self, left: &IntValue<T>, right: &IntValue<T>) -> IntValue<T> which would ensure only the same sub-types are added together when needed. So, IntValue<u8> and IntValue<u8> would be able to be added together but it'd prevent IntValue<u8> and IntValue<u16> from being added together, requiring a cast first. And, in case you do want different sub-types to be valid input you would just specify separate type variables: do_foo<L, R>(&self, left: &IntType<L>, right: &IntType<R>)

In terms of implementation details, the sub-type should basically just be a marker and not take up any additional space (as PhantomData if needed) (type parameters are cool!)

Outstanding questions:

How would custom & 80 bit width types be handled given that they don't map to rust types (yet?)? It may be permissible to disallow custom & 80 bit types since they are far less common. Or just find some workaround. i128 is stable; we can use custom type for the f128 and f80, etc I think.
~~usize should probably be a valid sub-type for ints, but I wonder if there's any issue with it's variable width nature? LLVM should allow us to map it to the pointer size at runtime~~ LLVM has a pointer-width type
Signatures will quickly get verbose. Just imagine a simple String type: StructType<(PointerType<IntType<u8>>, IntType<usize>, IntType<usize>)> Will the compiler be able to hide most of this from the user so that they don't have to specify annotations themselves?
~~Could the num crate's traits help with float and int sub-types?~~
Should signed types be explicit(IE IntType<i8>) or just an irrelevant fact about an IntType<u8>? I'm guessing LLVM doesn't let you mix signed and unsigned addition without casting first, so I'm leaning towards keeping them explicit.

TheDan64 commented 6 years ago

num crate doesn't look like it could help but typenum defines types for a lot of int sizes. So we could probably use those for custom width types: IntValue<U9>, IntType<I30> along with the builtins: IntType<u32>, etc. Also, I don't think LLVM supports custom width floats (at least not in 3.7) so those would only use builtins, though there's no f16 or f128 yet.

TheDan64 commented 6 years ago

Also, it seems like signed types should be explicit, though LLVM doesn't make this distinction.

TheDan64 commented 6 years ago

It's worth noting StructTypes (and probably StructValues) have two interesting properties:

Can be opaque, that is, no body yet set, which is useful for recursive types. So we need something like StructType<Opaque>. This should allow us to implement a set_body for only for StructType<Opaque>
StructTypes need a variable number of types. Not sure what kind of support rust has for variable sized tuples, but knowing variable sized arrays, I don't expect it to be great. Need to be able to express something like StructType<(A, B, C, D, ..., Z)>

TheDan64 commented 6 years ago

After looking at #32, it seems that build_global_string and related build_global_string_ptr methods segfault when called without an active function being built on for some reason (even though the string is global 👀). One possible solution would be to have a subtype for a builder in the context of a function, ie Builder<Global> & Builder<Function> similarly to how librustc_trans has two methods for creating builders (global and function scoped): https://github.com/rust-lang/rust/blob/master/src/librustc_trans/builder.rs#L54-L77

Those two methods could be implemented for only the latter Builder Function subtype.

Michael-F-Bryan commented 6 years ago

I know gtk-rs encountered a similar problem when emulating inheritance in Rust. They came up with an IsA trait which lets you use a child class as its parent.

So you might write a function that takes anything which IsA<IntType>, allowing you to accept a u32, i64, etc.

TheDan64 commented 6 years ago

Inkwell use to have something like that, however the issue was primarily that the trait would require you to assume the global context which may not always be desired and wasn't super intuitive (why does u64 suddenly have the global context even though my other types have been working with my non global context?)

TheDan64 commented 3 years ago

We're going to want Builder subtypes, as an unpositioned builder can cause segfaults in many scenarios.

TheDan64 / inkwell

Sub-Types (Type Safety v1.1) #8