Hack for implementing type inference

LouisJenkinsCS commented 5 years ago

I believe that you can possibly implement type inference by taking advantage of compilerWarning to obtain the type. For example, imagine you have something like this...

var x : int;
var y = x;
var z = (x + y) : real(32);

How would you get this type information? Like this.

var x : int;
compilerWarning("__TYPE__INFERENCE__ x:", x.type : string);
var y = x;
compilerWarning("__TYPE__INFERENCE__ y:", y.type : string);
var z = (x + y) : real(32);
compilerWarning("__TYPE__INFERENCE__ z:", z.type : string);

TIO

I suggest that the LSP could possibly instrument the original program by adding this to all variable declarations to query their type, which can then be parsed.

@lydia-duncan Any opinions on this approach? Should be a nice and decent short-term solution.

AnubhavUjjawal commented 5 years ago

@LouisJenkinsCS @Lydia-duncan wouldn't that need the code to be compiled every time a type is declared?? As far as I have used the compiler, it takes a lot of time to compile the code. Also, it won't work for incomplete code like

var x : int;
var y = x;
var z = (x + y) : real(32);
for i in

Think of the case when user leaves the for i in the bottom of code incomplete like this, and goes one line above to declare a variable he forgot. The compiler will give a syntax error .

Still, if you guys think I should try it, I would do it ??

LouisJenkinsCS commented 5 years ago

My suggestion would be to throttle requests to re-compile to once every 5 seconds.

Also compilation is slow but you can use something like --stop-after-pass resolve

TIO

As in you could cache these results and only regenerate it once every so often.

LouisJenkinsCS commented 5 years ago

As for incomplete code: if the compiler fails, prior to the "resolve" pass, don't discard previous results.

lydia-duncan commented 5 years ago

Seen, getting myself up to speed on the implementation.

lydia-duncan commented 5 years ago

I don't think compilerWarning would gain you anything in this situation beyond just running the compiler - it gets evaluated at the same time as resolution runs, so it would be equivalent to:

var y: x.type = x;

I'm not completely clear on the approach. Is implementing type inference yourself the alternative to relying on the Chapel compiler for type information?

In terms of how to avoid certain lines causing failures that prevent information about other lines, you could take the approach of dropping lines that cause early failures (i.e. if it gives a syntax error, drop that line and recompile to see if you get further in compilation, with resolution being the further I would go when other errors are present)

LouisJenkinsCS commented 5 years ago

I guess I misworded my question. I meant inquiring the inferred type from the compiler. Is there a way to get the compiler to dump all type information in a readable and predictable format?

LouisJenkinsCS commented 5 years ago

The goal would be to allow the type of a variable to be shown on mouse-hover. Screenshot from 2019-03-25 12-02-36

lydia-duncan commented 5 years ago

Ah, probably the --log compilation flag is what you're looking for (though it won't generate output for the particular pass that failed iirc). You can ask --log to only handle specific passes (see compiler/main/runpasses.cpp for quick ways to check specific passes) or can have it output for every pass. The file stored will have a lot of details about the AST, but you will likely be able to get any of the information you need from it, so long as it has been computed by that particular pass. I'm happy to give a run-down of what any of it means in a video call

LouisJenkinsCS commented 5 years ago

While looking through --log for 'resolve' pass, I finally see type information that can possibly be parsed.

unknown x[185464]:int(64)[10] "insert auto destroy"
unknown y[185506]:int(64)[10] "insert auto destroy"
unknown call_tmp[546537]:int(64)[10] "expr temp" "maybe param" "temp"
unknown call_tmp[546542]:real(32)[110] "maybe param" "temp"
unknown z[185557]:real(32)[110] "insert auto destroy"

It looks like the above. I'm wondering if this can be used to query the type of the variables rather than explicit instrumentation.

lydia-duncan commented 5 years ago

All type information should be known at the end of resolve (assuming it succeeds). The [10] ids can also be used to link user defined types to their definitions (left as an exercise to the reader ;) ).

AnubhavUjjawal commented 5 years ago

@lydia-duncan for the following code,

var s:real(32);
s = 40;
proc fname() {
    var i: int;
    var k = i;
    writeln("Hello");
}

and compilation command chpl test.chpl --log-pass r --stop-after-pass resolve, the test_13resolve.ast shows

AST dump for 3 after pass resolve.

{
  function chpl__init_3[300152]() : void[4] "insert line file info" "module init" "resolved"
  {
    unknown call_tmp[522818]:real(32)[109] "expr temp" "maybe param" "temp" "type variable"
    (388139 'move' s[178236](818193 call _defaultOf[818196]))
    unknown coerce_tmp[818397]:real(32)[109] "coerce temp" "insert auto destroy" "temp"
    (818403 'move' coerce_tmp[818397](818400 call _cast[818405] 40))
    (178241 call =[315755] s[178236] coerce_tmp[818397])
    (372643 return _void[43])
  }
  unknown s[178236]:real(32)[109]
  function main[561783]() : void[4] "resolved"
  {
    (561786 return _void[43])
  }
  function chpl_gen_main[561789](arg _arg[561788]:chpl_main_argument[159399]) : int(64)[10] "compiler generated" "export" "generated main" "local args" "resolved"
  {
    val global_temp[912890]:domain(1,int(64),false)[714289] "temp"
    val global_temp[909925]:string[29959] "temp"
    val ret[561824]:int(64)[10] "RVV" "temp"
    val _main_ret[561793]:int(64)[10] "temp"
    unknown _endCount[561794]:unmanaged _EndCount(AtomicT(int(64)),int(64))[643598] "temp"
    (561800 'move' _endCount[561794](561797 call _endCountAlloc[631389]))
    (561802 'set dynamic end count' _endCount[561794])
    (561804 call chpl_rt_preUserCodeHook[159570])
    (561806 call chpl__init_3[300152])
    (561808 call main[561783])
    (561810 'move' _main_ret[561793] 0)
    (561813 call chpl_rt_postUserCodeHook[159576])
    unknown coerce_tmp[818975]:_EndCount(AtomicT(int(64)),int(64))[642983] "coerce temp" "insert auto destroy" "temp"
    (818980 'move' coerce_tmp[818975](818978 'cast' _EndCount(AtomicT(int(64)),int(64))[642983] _endCount[561794]))
    (561815 call _waitEndCount[818509] coerce_tmp[818975])
    (561818 call chpl_deinitModules[159613])
    (561829 'move' ret[561824] _main_ret[561793])
    (561826 return ret[561824])
  }
}

Except for finding out the type of s, I can't find anything else useful in it :disappointed: . Could you help me in finding out how to find out the type of variables declared inside the proc fname() ?

AnubhavUjjawal commented 5 years ago

@lydia-duncan is there any specific way you would recommend on parsing these .ast files?

LouisJenkinsCS commented 5 years ago

Grep for strings that match "%VARIABLE_NAME[\d+]:%VARIABLE_TYPE[\d+]", stick this in a symbol table, use that.

LouisJenkinsCS commented 5 years ago

Oh misread your question, it's about finding variables inside of functions. I'll see if I can obtain a solution real quick.

LouisJenkinsCS commented 5 years ago

Found the issue: The function gets eliminated as its never used (and likely never needs to be resolved.) @lydia-duncan --no-dead-code-elimination doesn't seem to work either, it only gets resolved if I invoke fname() somewhere.

@AnubhavUjjawal I'd recommend for the time being that you try to move past this issue for now and try to obtain type information for the variables you can find.

lydia-duncan commented 5 years ago

Yup, if the function isn't called, it doesn't get resolved. Some of the compiler passes go through every symbol of a particular type (all classes, all functions, all modules, etc), while others only follow control flow through the program. Resolution only starts at the main function and follows that path, and then cleans up symbols that were not used. This allows us to avoid things like stamping out a copy of a generic function for every possible type we know about, but we've definitely had latent bugs in our code due to this.

I would recommend indicating to the user in some way if their code hasn't been resolved due to not being called. It is beyond the scope of this project to change the compiler so that those functions would get resolved as well, but we can detect by its absence that it hasn't been used and so could add a warning ("there may be errors here we can't see yet, please call the function" or something).

AnubhavUjjawal commented 5 years ago

@LouisJenkinsCS @lydia-duncan I was just wondering if I could find the type directly you know, like if user is hovering over a variable, instead of compiling the code, just use regular expressions to find the declaration??

lydia-duncan commented 5 years ago

My worry is that you would be reinventing the wheel, when we already have a solution. Any implementation you make that tries to recreate something the compiler already does without relying on the compiler itself has the potential for code drift and a maintenance burden.

For instance, when determining the type, what would you do if the user program had two type declarations with the same name? To solve that, you'd be reinventing scope resolution and reinventing the code to follow our use statements. What if our strategy for either of those changes (as it has recently due to rethinking our point of instantiation rules)? Your output will be incorrect, or you'll get a test failure and have to figure out how to adjust it in a similar fashion.

LouisJenkinsCS commented 5 years ago

You could 'cheat it' a bit by appending function calls to every single function at the very end of the file. Then use --ignore-errors to handle cases where compilerError gets explicitly invoked. Worth a shot.

lydia-duncan commented 5 years ago

That'll work for functions with no arguments, but not for functions with arguments. What would you insert for the call when the argument is generic? Or when the argument is a complicated type? I think having a warning so the user can insert appropriate calls is the best strategy.

AnubhavUjjawal commented 5 years ago

@lydia-duncan @LouisJenkinsCS using the compiler output, I am able to get this . There is still lot to implement, which I would do during the coding period. Currently I am writing some hacky codes as a proof of concept. Mar-26_-2019-10_57-PM

AnubhavUjjawal commented 5 years ago

Currently I am going to start writing a proposal on this, and would ask some concept questions on the same thread, wether they are implementable or not ? I believe as @lydia-duncan said,

The file stored will have a lot of details about the AST, but you will likely be able to get any of the information you need from it, so long as it has been computed by that particular pass.

I would be able to implement most of the functionalities.

@LouisJenkinsCS @lydia-duncan If you would like me to do some more implementations as proof of concepts, please tell some.

LouisJenkinsCS commented 5 years ago

@AnubhavUjjawal Can you discard everything but the type? That'd be more than sufficient.

AnubhavUjjawal commented 5 years ago

@AnubhavUjjawal Can you discard everything but the type? That'd be more than sufficient.

Yeah ok, did it. As I said, I didn't worry about the details shown in the hover but just made an implementation to make the type inference work.

AnubhavUjjawal / Chapel-LS-Playground

Hack for implementing type inference #2