0x7CFE / llst

LLVM powered Little Smalltalk.
Other
93 stars 10 forks source link

Type inference prototype based on static analysis #92

Open 0x7CFE opened 8 years ago

0x7CFE commented 8 years ago

This PR is my first rather independent (or naïve?) attempt to realize how type inference may be used to aid JIT VM. Underlying concept is not based on any well-known theory like Hindley-Milner type system or Martin Löf's Intuitionistic type theory. It's the result of a pure meditation on Smalltalk bytecodes.

Surprisingly enough, Smalltalk being fully dynamic in it's nature is still very regular in terms of it's memory access and control flow. This really helps when we try to perform static analysis.

Current implementation concentrates on the method temporaries and gives up completely when it faces object fields. However, I believe that in local context it is still possible to infer the object's fields to fully unlock further analysis and optimizations, like stack allocation, GC root elimination, and of course TBAA.

Current implementation is powerful enough to infer self assigning temporaries within a loop and even in complex closure contexts.

For example, the following method is inferred completely:

testInference |sum|
    sum <- 0.
    1 to: 100 do:
        [ :x | sum <- sum + x ].
    ^sum

Here analyzer proves that sum variable will have SmallInt type in it's scope which spans across the follwing methods: Undefined>>testInference, Number>>to:do:, Block>>value: and finally, the block Undefined>>testInference@8. Integer overflow is currently undefined.

This allows IR generator to encode operations on sum directly as i32 without any worries about GC or dynamic dispatch.

Current inference scheme highly uses method monomorphisation and specialization, which helps to perform calculations at compile time.

For example, this code is reduced to a single literal value (the result) at compile time:

fibonacci: n
    n < 3 ifTrue: [ ^1 ].
    ^ (self fibonacci: n - 2) + (self fibonacci: n - 1)

Type inference works even in case of recurring contexts and correctly solves chicken or egg problem. Consider the listing of the Collection>>sort:

sort: criteria | left right mediane |
    (self size < 2) ifTrue: [^self].

    mediane <- self popFirst.

    left  <- List new.
    right <- List new.
    self do: [ :x |
        (criteria value: x value: mediane)
            ifTrue:  [ left  add: x ]
            ifFalse: [ right add: x ] ].

    left  <- left  sort: criteria.
    right <- right sort: criteria.

    right add: mediane.
    ^ left appendList: right

This is suboptimal implementation of the Quicksort algorithm used here only for testing purposes. It performs two recursive calls to Collection>>sort: when sorting left and right sublists. The main difficulty here is to infer the return value of the Collection>>sort:. For example, analysis of the left refers to the outer context which at that point is not inferred completely, hence no return type available.

However, current implementation correctly solves the problem using a fact, that every recursion has it's base which should be evaluated prior to the next recursive call. By propagating the base return type from the outer context to the inner one we succeed in the whole process. See the log for an example of such inference. You may also see the resulting call graph as rendered svg or graphviz source.

Static analysis along with runtime statistics and polymorphic method cacheing provides enough information for effective dispatch of the Smalltalk code.

See also: #56, #58.

The implementation is not complete yet, there are a lot of things to do:

coveralls commented 8 years ago

Coverage Status

Coverage decreased (-12.7%) to 48.582% when pulling 34c1048600b974b902fe21e6079baa329e1c1d5d on feature/17/type_inference into ff4d76d30584adaa20a309ca2487122cb51479c6 on develop.