dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.88k stars 4.01k forks source link

Interactive Design Meeting 3/27/15 & 4/3/15 #1892

Open kuhlenh opened 9 years ago

kuhlenh commented 9 years ago

Interactive Design Meeting 3/27/15 & 4/3/15

Definitions

The following individuals are members of the dream team: Anthony D. Green Chuck Stoner Dustin Campbell Kasey Uhlenhuth Kevin Halverson Kevin Pilch-Bisson Lincoln Atkinson Manish Jayaswal Matt Warren Tomas Matousek


Agenda: Scripting Scenarios

The purpose of these meetings was to figure out what scripting scenarios we want to support and propose solutions. Once we have this artifact, we can then have concrete discussions with scriptcs for possible future collaboration.

Proposal for #r and #load

We use the #load directive for sources (i.e., "include a script source") and the #r directive for referencing assemblies (i.e., "reference the assembly and no scripts run"). A developer can actually use #load or #r on a NuGet, but each directive will lead to a different behavior. We would recommend a developer default to #load NuGet packages--which will run any scripts present and these scripts can #r assembly references. If a developer uses #load on a NuGet that contains no scripts, we will #r the package. Using #r on a NuGet will reference all assemblies inside and no script will be run.

NuGets are used everywhere in .NET now. We want to use this packaging paradigm in our scripts in order to seed them with execution context and reference assemblies. NuGet packages have the following directory structure: tools, lib, content, and build. The tools directory contains scripts and we would like to allow NuGet authors to put a default .csx script in there (e.g., "init.csx").

Examples

Example 1: If I want to reference an assembly--for example, Generic Lists--in my script, I would use:

#r "System.Collections.Generics.dll"
using System.Collections.Generics;

Example 2: If I wrote a script that sets up execution context for my current script, I would use:

#load "myContextScript.csx"

Example 3: If the NuGet package I want to load has an initializer script init.csx that I want to run to help me set up my scripting environment, I would use:

#load "NuGetWithScript.nupkg"
//this will run NuGetWithScript's init.csx script

Example 4: If the NuGet package I want to load has an initializer script init.csx that I do not want to run, I would use:

#r "NuGetWithScript.nupkg"
//NuGetWithScript's init.csx file will not be run

Seeding a script with other scripts for execution context

A problem we foresee with this scenario is that the user would have to manually #r every assembly from the seed script to use in the current script. To remedy this, we propose making a new "global directive" #u. The #u directive can only be used in scripts that initialize an environment or as a top-level script in order to prevent polluting the global namespace.

To be continued...

We still are unclear on our solution for the scenario of running the intializer script init.csx for packages that have dependencies on other packages. We were only able to briefly touch on this, but our general proposal is: If you #load a NuGet N1 with a dependency on another NuGet N2, we will only run N2's init.csx if N1's init.csx used #load for N2. In other words, using #load on N1 does not mean you will run all N1's dependent packages' init scripts.

paulomorgado commented 9 years ago

Why #load and not #l like #r and #u?

What role does PowerShell play here?

Is this just for coding environments like Visual Studio or for other package uses like OneGet?

glennblock commented 9 years ago

Hi @kuhlenh thanks for sharing this. I have a few concerns/questions on the proposal.

First, on #load for packages. The question here is should package loading be a separate concern from the code? If you look at node as example, the package.json has the node modules. This enables tooling to easily come along and either search that file for installing modules, OR update that file if you do a --save i.e. npm install express --save will update my package.json with the module after I install it. This is also very common in the scriptcs experience, by default when you do say scriptcs -install automapper we will automatically updated the packages.config / or create a new one if it does not exist. Even if you could get auto-installation to work on execute (which not everyone is a fan of), that doesn't solve the tooling issue I just mentioned.

One option I can think of is establishing a convention called Packages.csx which is where all #load's for packages go. This has a few advantages, like you don't have to write JSON/XML and you can still use CSX sytax, BUT it is more tool-able.

The idea of the entry level script when a package is loaded is interesting and I can see the init script being useful. On the other hand, it makes package installation more brittle, ie if an exception occurs in executing that code, then that has to be handled somehow. npm has install scripts, but many people also want to get rid of them for a similar reason.

On the decision to only run the deepest script if the parent script loads it, that seems a little strange / violating separation of concerns. Why should the parent script have to have internal knowledge of the package it depends on? Can you give a bit more detail on this decision?

One big thing I am not seeing in the proposal is an ability to introduce shared functions via scripts in a package. With Script Libraries in scriptcs we allow this. That is really useful in that it allows scripts to be easily shared via a package and reused. I talk more about the why and what of this here: https://github.com/scriptcs/scriptcs/wiki/Script-Libraries

glennblock commented 9 years ago

Forgot to mention the other question, #u. Can you describe more on what that is / why it is needed / how it will work?

tmat commented 9 years ago

@glennblock Re #load of nuget package. It's up to the host to support this. If your host prefers to list nuget packages separately in package.json and then consult that file before compiling the script that's fine. However, the advantage of having nuget references directly in the script (in #load directive) is that the scirpt file is all I need. Everything is specified there. I don't need to carry around another file.

Re tooling - tooling can scan the script and find #load directives in the file.

Re init script - it doesn't make installation brittle. The script is not run at install time. The script is included into the compilation of the script that contains #load and the code is thus executed at the time the script is executing. All packages has to be installed before that.

Re shared functions - the init script is the place to define those functions. Since the init script is included into the compilation that #loads the package the functions will be available to the including script.

tmat commented 9 years ago

@glennblock Re #u: the init script uses #u to define imports available globally, that is to the script including the package.

glennblock commented 9 years ago

@tmat Yes it could scan, but it also means individuals have to scan through all files to find the packages that are used. Having a separate place for packages is a well defined pattern in node, Ruby, Java (pom.xml) etc. Developers are accustomed to this pattern. They know go to the xxxx file to find if packages are present or not. Also there's the question related to installation. If I want to support an experience where the installer knows where to store the package name when I load from the CLI, it is easy to do with a separate file. Not so it it is embedded in the code. i.e. if I have multiple files, it has to determine what is the top-level file. And what if I have multiple top-level files? How does it know where to inject the #loads?

As to the init script, ok, I didn't realize this ran at execution time. OK, that does solve my install concern.

In terms of shared functions, the problem you introduce then is ambiguity as two init scripts can have functions (or other members) with the same name which are in completely different functions. We solve this in ScriptLibraries by forcing a wrapper class around each package init script so that they don't overlap with one another.

How do you see avoiding the ambiguity/conflict if other public members like functions are present?

stirno commented 9 years ago

My simple-minded question regarding #load NuGet.pkg is.. where would the package contents (~/packages//*) be downloaded to? Relative to the script being run seems bad. This is 'fixed' by the convention-based package.json/package.csx/whatever by locating packages relative to it.

tmat commented 9 years ago

@glennblock I get there are scenarios where a separate packages.json is advantageous. And we should support them. However, we should imo also support nuget #load in the script, so that one can create standalone scripts (simple samples, shell scripts, etc.).

@glennblock Obviously, some convention to prevent pollution would be a good idea. We were thinking about a similar convention to Python. When Python imports a module my_module a variable my_module is introduced and the functions defined in the module are encapsulated in it, I believe. If an init script of a package wishes to expose some functions it would define a top-level symbol of the same name as the package that exposes this functionality.

That symbol can be a class for example, or it could be a property or a global import. The init script could like like:

#r "MyLibrary"
#u MyLibrary.SomeNamespace

public static class MyPackage 
{
    public static void MyHelper() { ... }
}

// some initialization code

Would it work?

Pilchie commented 9 years ago

@stirno In the world of dnx, we would restore packages to your %UserProfile%.dnx\packages\ directory anyway, instead of having a packages folder in your solution. (We're also hoping to do this for non ASP.Net projects :smile:)

glennblock commented 9 years ago

@tmat no objection to supporting inline package syntax for simplicity / REPL.

glennblock commented 9 years ago

@Pilchie as long as I don't end up with a mini-gac hell I am ok with that. Based on what we saw with our own global package woes, I worry about that.

glennblock commented 9 years ago

@Pilchie the nice thing about a local packages folder is everything is completely isolated. You don't accidentally get dependencies you didn't bet on, but which were satisfied based on a heuristic.

glennblock commented 9 years ago

@tmat yes having a wrapper class can work, that is basically what we do in scriptcs only we create it for you and name it based on a convention that uses the init filename. In our case it the init name is StartXXXX.csx and the XXX is used for the wrapper class name.

This was in order to simplify the code and not require you to embed your own class. It gives it a more lightweight feel. You can use static methods on that class but we allow instance methods as well.

If we did what you are suggesting, it would be good to enfrorce that no top level methods are exposed in the init script.

tmat commented 9 years ago

@glennblock I think we could start with the explicit class.

In case a class defined in the assembly included in the package already contains the helpers you want to expose you can do

#u MyPackage = MyNamespace.Helpers;

If it is too verbose we could consider "into clause" :

#load "Package.nuget" into Foo

Which would do the wrapping of top level stuff into Foo singleton class accessible thru a global Foo property. This might actually be better than the other solutions since the script that is including the other script is in control. I can start with a #load directive without the clause and if I get ambiguity I need to resolve I can add the into clause.

glennblock commented 9 years ago

Oooh I really like that. into sounds like a great idea.

glennblock commented 9 years ago

This would def solve the problem with the added benefit that it would allow the consumer to resolve conflicts.

:+1:

stirno commented 9 years ago

Really like into for this as well. :+1:

ManishJayaswal commented 9 years ago

One of the design principle we talked about is the desire to have the scripts be sharable across various hosts. So we should "try" to not make the scripts dependent on some global state that host/environment sets up. Then the scripts will fail to run or behave differently in different host/environment. I wanted to mention this as it was explicitly stated in the notes.

tmat commented 9 years ago

@ManishJayaswal I'd say we need to word it a bit more precisely. Of course, any extension points we provide via hosting API potentially make scripts host specific. I think we need to distinguish classes of scripts, each class designed for a specific kind of host. A different instance of the same host should compile the same script with the same semantics.

For example, let's say that I design an app that is scriptable. I expose a host object MyApp that has certain API (object model) that the script can access. Then indeed scripts using that API will be host specific and won't run in other hosts that don't expose such API. But these scripts should compile in all instances of MyApp correctly.

Similarly with extensibility of #r and #load -- since the strings are host-interpreted not all hosts will implement them the same. And that's fine.

Of course, even if we limit ourselves to a particular class of scripts there might be failures due to machine state accessed from the script thru IO APIs etc. I'd call them run-time failures, as opposed to compile-time failures.

tmat commented 9 years ago

Having shebang syntax would be beneficial. It describes the app the script is supposed to be executed by. For example,

#!scriptcs
adamralph commented 9 years ago

OK, I've read through the lot :smile:

My feedback:

Definitions

IMHO we should drop the assumption that scripting is only for 'simple and small programs'. The considerable ecosystem of apps running on node, ruby, python, etc. shows that this isn't the case. It's certainly lightweight, one of it's strengths, but what people will go on to build with it, no one knows.

Proposal for #r and #load

I like this behaviour since I do believe that nuget packages should become the units of reference rather than individual assemblies or scripts. I think it's also important that the search path is controllable. E.g. for scriptcs we would like this to be scriptcs_packages/.

Seeding a script with other scripts for execution context

I'm not sure I understand the issue here or why #u is required. In scriptcs, #load simply makes the callee script execute as if it were part of the calling script so the callee script will inherit all references and using statements. Is this not the designed behaviour for #load? Or are you referring to the reverse requirement, when the caller script needs to reference and/or add using statements for the types which are publicly exposed from the callee script? It's not very clear from this paragraph which way round the consideration is.

Nested init.csx

I think this is a no brainer. If a script calls #load on N1, which deliberately does not #load N2 we should never take the liberty of running init.csx from N2. The only thing which should invoke the running of init.csx in a given package is an explicit #load.

Into

I really like this but I'm a little concerned about the idea of a singleton accessible via a global property. I'm guessing this would be a property on the host object? My concern here is portability across hosts. E.g. what if the host also provides a method or property named Foo? I think I would prefer an instance to be returned to keep things isolated. E.g.

#load "Package.nuget" into foo
foo.Bar();

I haven't thought too much about the implications of that though. Just thinking out loud.

Nested dependencies

This is outside the scope of the notes you posted, but whilst we're on the subject of referencing and loading NuGet packages, I'd like to plant the seed regarding nested dependencies, if it hasn't already in been planted. This would work similarly to node, i.e. each package gets the version of the package which it defines as a dependency, in it's own nested packages folder. This would remove dependency (I'll resist using 'DLL') hell for packages which only contain scripts, but challenges would remain for packages which expose assemblies. Just some food for thought.

glennblock commented 9 years ago

I assumed into was creating a new type not an instance. So you would do 'var foo = new Foo()' after an 'into Foo'. On Thu, Apr 16, 2015 at 12:00 PM Adam Ralph notifications@github.com wrote:

OK, I've read through the lot [image: :smile:]

My feedback: Definitions

IMHO we should drop the assumption that scripting is only for 'simple and small programs'. The considerable ecosystem of apps running on node, ruby, python, etc. shows that this isn't the case. It's certainly lightweight, one of it's strengths, but what people will go on to build with it, no one knows. Proposal for #r and #load

I like this behaviour since I do believe that nuget packages should become the units of reference rather than individual assemblies or scripts. I think it's also important that the search path is controllable. E.g. for scriptcs we would like this to be scriptcs_packages/. Seeding a script with other scripts for execution context

I'm not sure I understand the issue here or why #u is required. In scriptcs, #load simply makes the callee script execute as if it were part of the calling script so the callee script will inherit all references and using statements. Is this not the designed behaviour for #load? Or are you referring to the reverse requirement, when the caller script needs to reference and/or add using statements for the types which are publicly exposed from the callee script? It's not very clear from this paragraph which way round the consideration is. Nested init.csx

I think this is a no brainer. If a script calls #load on N1, which deliberately does not #load N2 we should never take the liberty of running init.csx from N2. The only thing which should invoke the running of init.csx in a given package is an explicit #load. Into

I really like this but I'm a little concerned about the idea of a singleton accessible via a global property. I'm guessing this would be a property on the host object? My concern here is portability across hosts. E.g. what if the host also provides a method or property named Foo? I think I would prefer an instance to be returned to keep things isolated. E.g.

load "Package.nuget" into foo

foo.Bar();

I haven't thought too much about the implications of that though. Just thinking out loud. Nested dependencies

This is outside the scope of the notes you posted, but whilst we're on the subject of referencing and loading NuGet packages, I'd like to plant the seed regarding nested dependencies, if it hasn't already in been planted. This would work similarly to node, i.e. each package gets the version of the package which it defines as a dependency, in it's own nested packages folder. This would remove dependency (I'll resist using 'DLL') hell for packages which only contain scripts, but challenges would remain for packages which expose assemblies. Just some food for thought.

— Reply to this email directly or view it on GitHub https://github.com/dotnet/roslyn/issues/1892#issuecomment-93815559.

adamralph commented 9 years ago

@glennblock that could also work

glennblock commented 9 years ago

I am still on the fence of the described behavior of a neated init.csx, I don't agree not loading it is a no-brainer.

As a package author who includes an init.csx, I do it with the intent that it should always be ran.

Similarly in node if I include code in index.js in my module, I expect that to always be ran.

adamralph commented 9 years ago

@glennblock whether or not there should even be a facility to consume a package without running init.csx is another matter and you may have a good point there. @kuhlenh / @tmat can you provide the reasoning behind that proposal?

I only meant that it is a no-brainer in the case where this facility exists, and when a caller script #loads N1, but N1 does not #load N2, that we should not take the liberty of running init.csx from N2, since it is an indirect dependency.

glennblock commented 9 years ago

Gotcha On Thu, Apr 16, 2015 at 12:37 PM Adam Ralph notifications@github.com wrote:

@glennblock https://github.com/glennblock whether or not there should even be a facility to consume a package without running init.csx is another matter and you may have a good point there. @kuhlenh https://github.com/kuhlenh / @tmat https://github.com/tmat can you provide the reasoning behind that proposal?

I only meant that it is a no-brainer in the case where this facility exists, and when a caller script loads N1, but N1 does not #load N2, that we should not take the liberty of running init.csx from N2, since it is an indirect dependency.

— Reply to this email directly or view it on GitHub https://github.com/dotnet/roslyn/issues/1892#issuecomment-93822799.

ManishJayaswal commented 9 years ago

@glennblock @adamralph The recommended way to consume packages would be to use #load which will run the init.csx. The #r option is to give flexibility to the consumer when they do not want to run the init script. Say you set some global usings using #u in init.csx which the consumer does not want because they introduce ambiguity in his code. However he wants to reference the libraries. This option gives them a way to do so.

glennblock commented 9 years ago

So what you are saying is #u namespaces are only in scope within init.csx and any child scripts it loads, but not in the consumer?

On Thu, Apr 16, 2015 at 4:15 PM Manish Jayaswal notifications@github.com wrote:

@glennblock https://github.com/glennblock @adamralph https://github.com/adamralph The recommended way to consume packages would be to use #load which will run the init.csx. The #r option is to give flexibility to the consumer when they do not want to run the init script. Say you set some global usings using #u in init.csx which the consumer does not want because they introduce ambiguity in his code. However he wants to reference the libraries. This option gives them a way to do so.

— Reply to this email directly or view it on GitHub https://github.com/dotnet/roslyn/issues/1892#issuecomment-93860255.

tmat commented 9 years ago

@glennblock No. #u is a compilation level using (similar to VB project level import). Manish meant that if the consumer doesn't want the convenient usings and helpers that init.csx defines for them it has an option to use the assemblies packaged in the nuget directly via #r.

tmat commented 9 years ago

Essentially, #r is equivalent to adding a reference to a .csproj in VS.

glennblock commented 9 years ago

As to the #r is this flexibility for security reasons? Or something else?

I can see a few reasons one would write code in init.csx.

One: To do some first time setup the first time the library is used. I do something like this in ScriptCs.Edge as I need to copy native edge assemblies into the right place the first time you use it. Currently it lives within the Edge script pack: https://github.com/scriptcs-contrib/scriptcs-edge/blob/master/src/ScriptCs.Edge/EdgeScriptPack.cs#L23 when it is initialized, but I could imagine it being in init.csx.

Two: Expose some helper functions to the client, this would be similar to what designed Script Libraries for. With Script Libraries because we wrap all your code in an outer class, we ensure no code will accidentally execute without your permission. Compile yes, but it won't execute until you invoke the methods on that class.

Are there other scenarios you are thinking of?

On Thu, Apr 16, 2015 at 4:26 PM Glenn Block glenn.block@gmail.com wrote:

So what you are saying is #u namespaces are only in scope within init.csx and any child scripts it loads, but not in the consumer?

On Thu, Apr 16, 2015 at 4:15 PM Manish Jayaswal notifications@github.com wrote:

@glennblock https://github.com/glennblock @adamralph https://github.com/adamralph The recommended way to consume packages would be to use #load which will run the init.csx. The #r option is to give flexibility to the consumer when they do not want to run the init script. Say you set some global usings using #u in init.csx which the consumer does not want because they introduce ambiguity in his code. However he wants to reference the libraries. This option gives them a way to do so.

— Reply to this email directly or view it on GitHub https://github.com/dotnet/roslyn/issues/1892#issuecomment-93860255.

glennblock commented 9 years ago

OK let me up-level and see if I am getting this.

You are envisioning a way to have standard NuGet packages expose a scripty interface as well through init.csx. Today in scriptcs land there are generally 2 packages, the original one, and the one that adds the scripty interface i.e. for example scriptcs.nancy adds a scripty interface to Nancy.

In the new world, there could be one package which includes both the non-scripty and scripty interface. If one doesn't want to have the scripty interface loaded / the default usings injected, they just just use #r on the package and load the assemblies directly.

Is that kind of it?

glennblock commented 9 years ago

Made some updates to my previous comment

tmat commented 9 years ago

@glennblock Yup.

glennblock commented 9 years ago

@tmat OK, thanks for clarifying, that helps. The changes make sense in this context.

adamralph commented 9 years ago

So #load .. into ... and #u would effectively be like a script pack in scriptcs, and #r would be just like bypassing the script pack and going straight to the underlying package. That should work. The latter usage sounds like an edge case but I guess it does no harm to provide the facility.

With regard to init.csx itself, do you envisage this supporting arbitrary script or will it have to conform to a certain shape? E.g.

// init.csx
var x = 2;
public class Bar { }
public int Baz() { }
x * x

As an arbitrarily invoked script, I'd expect this to provide a Bar class, a Baz method and a return value of 4. Do you see some way to wrap all this in a class created by #load Foo.nupkg into Foo?

I guess Bar would be a nested class and Baz would be a method on Foo. Would the loose script go in the constructor? The only question then is do we care about the return value? Perhaps return values simply make no sense in the case of #load in any context, whether a NuGet package or an independent script.

adamralph commented 9 years ago

I assume that #load ... into ... would work the same for general scripts, not just nuget packages? I.e. #load foo.csx into Foo?

The more I think about it, the more I favour @glennblock's suggestion of #load ... into ... defining a class which is then newed up. After all, these are compiler directives so I don't think that they should create an instance, nor should they fiddle with the host by adding a new global property. If we go down that route then all we are really doing is defining a new way to define a class, using script defined in a file. We could even go a step further and embrace that:

#class Foo Foo.nupkg
#class Foo foo.csx

or to be more language agnostic:

#type Foo Foo.nupkg
#type Foo foo.csx