Make Ceylon scriptable - Githubissues

CeylonMigrationBot commented 12 years ago

[@FroMage] Needs some thinking

[Migrated from ceylon/ceylon-spec#200]

CeylonMigrationBot commented 10 years ago

[@lucaswerkmeister] @RossTate Well, if we ditch the REPL idea, then the script runner could just compile the program completely and then launch it in a new JVM / node.js instance, right? Which means that the main work would then be defining and implementing the new syntax instead of working classloader magic.

CeylonMigrationBot commented 10 years ago

[@gavinking] @lucaswerkmeister Right. There's no real problem with defining new types in a script. A REPL is a different beast, however, and the JVM was simply not designed with that usecase in mind.

CeylonMigrationBot commented 10 years ago

[@gavinking]

If it's really a need I'd be happy to write the code for that.

I think you're underestimating just how hard this would be. There's no easy way to redefine a class in the JVM. You would need to do some pretty tricky messing about with classloaders to make it work.

OTOH, arguing against myself: a REPL based on node.js would probably not be hard to implement. This raises the question of what VM are we primarily targeting here:

the JVM?
Node?
both?

CeylonMigrationBot commented 10 years ago

[@djadmin] Well, I'm quite comfortable with Node. So I'd suggest Node first and then we can proceed to target JVM. What you say ?

CeylonMigrationBot commented 10 years ago

[@gavinking] Well, thinking it through, I realize that a script for Node is not especially useful since stuff like interacting with the filesystem is much more difficult than with the JVM.

CeylonMigrationBot commented 10 years ago

[@lucaswerkmeister] I’m a bit confused as to what exactly “Make Ceylon scriptable” involves. What’s the identifying feature of a Ceylon script? What, besides convenient invocation, separates it from ceylon compile --src . file.ceylon; ceylon run default? With the void { ... } syntax, we would already be far away from other scripting languages like Python, Bash, etc., all of which allow top-level statements.

CeylonMigrationBot commented 10 years ago

[@lucaswerkmeister]

interacting with the filesystem is much more difficult than with the JVM.

Maybe “node” and “browser” should be considered different backends? ceylon.file makes no sense for browsers, but you could implement it for node.js.

CeylonMigrationBot commented 10 years ago

[@lucaswerkmeister] And by the way, another thing that’s IMO even more necessary than file I/O is the ability to call processes – in fact, we might even want to introduce some syntax sugar for that (possibly even for something like Bash’s $(command), obtaining a command’s output as a string), and import ceylon.process automatically if it’s required.

CeylonMigrationBot commented 10 years ago

[@gavinking]

I’m a bit confused as to what exactly “Make Ceylon scriptable” involves. What’s the identifying feature of a Ceylon script? What, besides convenient invocation, separates it from ceylon compile --src . file.ceylon; ceylon run default?

Well, first and foremost, it means I can just run a source file from the Unix command line, without needing to compile it first. We need to make #!/usr/bin/ceylon script work.

It also means letting me specify module dependencies within the same source file as my executable code.

And I guess it also means letting me having some runnable code that doesn't have an explicit function name.

CeylonMigrationBot commented 10 years ago

[@lucaswerkmeister] For the second and third point, are we still on module { ... }, void { ... }, or did we decide on having top-level statements and module imports?

CeylonMigrationBot commented 10 years ago

[@quintesse]

We need to make #!/usr/bin/ceylon script work. It also means letting me specify module dependencies within the same source file as my executable code.

We need to think this through though because there are a couple of things that I'm not sure about:

how would we handle multiple scripts? Do they all go in the same default.car?
If each script can define dependencies would that module have the union of all dependencies? (might become costly)
where are the compiled modules stored/found? normally this is relative to the folder where you execute ceylon compile and ceylon run, but for scripts, especially the ones with hashbangs this is not defined. (python and their ilk store the compile .pyc next to the .py source file, but I don't think that's a desirable, or even possible, option in our case)

CeylonMigrationBot commented 10 years ago

[@lucaswerkmeister] I always thought the scripts would be independent of each other, and the compiled module would be deleted immediately after execution finishes. Therefore:

They should go into a $scriptFileName.default.car
Independent, so no (and how would that work out if they import different versions?)
Next to the source file?

CeylonMigrationBot commented 10 years ago

[@quintesse]

and the compiled module would be deleted immediately after execution finishes

Well but that would make them horribly inefficient, I don't see anyone using them that way.

Independent, so no (and how would that work out if they import different versions?)

That might be, but if they are regular .ceylon files somebody might still type ceylon compile default and if so what would the result be? (Which was one of the reasons I suggested giving Ceylon script files their own, different, file extension. They would never be compiled together and always result in separate modules)

Next to the source file?

Well unless you delete them afterwards as you suggest that could become terribly messy with .car files intermingled with source files.

The advantage of having them there next to the source file is that you're bound to throw away the car if you remove the script, while if we would have some global cache (perhaps based on the actual file location of the script) you'll end up with a trash heap of old modules that never gets cleaned. (although perhaps the current cache has some of the same problems)

CeylonMigrationBot commented 10 years ago

[@quintesse] Also consider how you will be developing them. You still want to use Eclipse to write them and maybe they are part of a larger project where you have a couple of scripts that give easy access to some functionality.

Now where do I put them?

If I put them in the source folder so Eclipse treats them correctly they will be compiled together into the default.car, not what I want and possibly even results in unnecessary errors.

I'd have to put each of them in their own project just to be able to edit them.

I think that if we go for a separate extension, for argument's sake let's say .cys, both Eclipse and the command line compiler can treat it differently. In Eclipse you'd be able to edit it even outside official source folders and it would not compile it to a .car. And the command line compiler would just ignore them because they are handled by ceylon script, not by ceylon compile.

Maybe there are other ways too, it's just a suggestion/idea of mine.

CeylonMigrationBot commented 10 years ago

[@PhiLhoSoft] Side note about REPL with compiled OO languages on the JVM: Scala has a REPL that doesn't seem to be mentioned here. I used it very occasionally, and it was mostly for quick tests. Some Scala users seem to highly praise such tool, using it a lot. What I recall from the mailing list threads about it:

It is hard to get such tool right.
They did it awesome (apparently), with completion, calling back previous operations (an unfortunate accidental exit and you risk to loose your whole session!), at the cost of lot of work.
Some things possible in an editor don't work in the REPL. Semi-colon inference, among other things, doesn't like opening braces on their own line, while they are legal in compiled mode.

My PoV is that such tool is valuable when you have not a decent IDE support, or for people allergic to such IDE, preferring coding in their editor (often vi!) and compiling from the command line... Other times, other habits / tastes...

Like Gavin, I prefer to type some lines in the IDE and have immediate feedback, auto-completion and help tips, debugging, persistence of the experiment, etc.

CeylonMigrationBot commented 10 years ago

[@PhiLhoSoft] I have a naive question, that doesn't seem to have been asked here: What is the interest to have a Ceylon "interpreter"?

A scripting language, IMHO, should be lightweight and with a very little overhead: it must start in less than one second. And, of course, it must have a high integration with the system, being able to run commands, to capture their output, perhaps to feed some input (pipes, etc.), etc.

If it is just to have a Ceylon program compiled on the fly and then run, it offers little advantages over a two step process that can be done with some shell script.

On the other hand, if the Ceylon scripting mechanism can be made JSR-223 compliant, it can offer interesting applications, like having a Ceylon application able to run Ceylon scripts to customize it dynamically.

CeylonMigrationBot commented 9 years ago

[@gavinking] Since @zoek1 is going to start work on this issue, I think it's time to finally make some decisions on this issue.

I propose that:

a script is a sort of lightweight module definition,
scripts don't have any relationships to module repos, nor source folders, etc, they can sit wherever is convenient
scripts can be compiled (to a .car or .js in a module repo), but usually we just distribute them as plain .ceylon source files

The syntax of a script is:

a module statement,
optionally, one or more module import statements,
optionally, one or more package import statement,
a list of statements and declarations, interpreted as if they occur in the body of a toplevel function.

For example:

module script;
import ceylon.collection "1.1.1";
import ceylon.collection { ArrayList }

value list = ArrayList { "hello", "world" };
printAll(list);

This script, when compiled, produces a module named script with a shared toplevel function named run().

An open question is whether we allow a script to be split across multiple .ceylon files, or if we force you to put everything in one source file.

CeylonMigrationBot commented 9 years ago

[@lucaswerkmeister] Why is the module statement required?

CeylonMigrationBot commented 9 years ago

[@gavinking] @lucaswerkmeister So that the parser can distinguish that this is a script.

CeylonMigrationBot commented 9 years ago

[@gavinking] P.S. I'm OK with allowing just:

module;
print("hello world");

if you're happy for your script to execute in the default module.

CeylonMigrationBot commented 9 years ago

[@lucaswerkmeister] So a module statement is

ModuleStatement: Annotations "module" FullPackageName? ";"

and everything else is unchanged, except that module imports are now toplevel. (Fine by me, feels a bit more “script-y” than wrapping them in the module descriptor.)

:+1:

On semantics: I assume the typechecker checks the script just like a function body, in that “toplevel” declarations must work top-down and can’t be circular?

CeylonMigrationBot commented 9 years ago

[@gavinking]

I assume the typechecker checks the script just like a function body, in that “toplevel” declarations must work top-down and can’t be circular?

Precisely. I briefly thought of making it work like the body of a class (initializer then declarations) but then I realized that there would be no point, since there's nothing to call the declarations.

CeylonMigrationBot commented 9 years ago

[@quintesse] But we'll need to "enforce" the uniqueness requirement of the module name somehow, right? Especially when you allow them to be default modules as well.

So probably the real question is: where do we store the .cars for those compiled scripts?

Languages like python will create a foo.pyc compiled file beside the foo.py file, so we could create an example.car next to the example.ceylon script. Of course that would require you to always have write access to the folder where the source file is located. (It does make "clean-up" easier, if you remove the source file you probably won't forget to delete the compiled version.)
We could of course just compile them to ./modules as we do now but I have the feeling people don't like to see their (non-ceylon) folders littered with seemingly unrelated files/folders. A variant of this would be to compile scripts to ./.ceylon/modules effectively hiding the output folder from sight. But we'd still have the problem of what to do with multiple "default" scripts in the same folder. (And clean-up is more difficult though in both these cases.)
We could also compile them to a fixed place like the cache (~/.ceylon/cache) but we don't allow default modules there (and again how to distinguish between them?) so that's not really an option.
Another option would be a new folder, something like ~/.ceylon/compiled-scripts and then match to the exact location of the script on disk somehow (eg. ~/.ceylon/compiled-scripts/home/user/ceylon/bin/example.ceylon/default.car).
A variant of item 4 would be to use a SHA1 summed version of the URI/Path instead of the actual URI/Path (which might contain all kinds of weird things if the script came from a network source for example)

(The problem with those last two options is also the clean-up , they might fill up with crap pretty quickly if you use a lot of scripts just a couple of times. The folder itself is pretty easy to delete though.)

Personally I think the first option of generating the .car next to the script file is the most flexible, perhaps with a fallback to the ~/.ceylon/compiled-scripts option when you don't have write access to the folder.

CeylonMigrationBot commented 9 years ago

[@quintesse] Oh btw, as side-effect of the above, where compiled scripts are not really stored in the normal modules repositories, I think scripts should always be default modules: they can import but can never be imported themselves.

CeylonMigrationBot commented 9 years ago

[@quintesse] Unless we also want the script source syntax available for normal Ceylon modules, as this really quick to write, single-file module. That would be great, but it would then have to follow all the normal Ceylon source rules: be in a source folder, be compiled with ceylon compile etc. In the above messages I'm referring to Ceylon scripts as used in system scripting, eg. being able to do:

example.ceylon

#!/bin/ceylon run
module;
print("hello world"):

and run it with:

$ ./example.ceylon
hello world

CeylonMigrationBot commented 9 years ago

[@gavinking]

Oh btw, as side-effect of the above, where compiled scripts are not really stored in the normal modules repositories, I think scripts should always be default modules: they can import but can never be imported themselves.

I disagree, and I don't see how it follows.

CeylonMigrationBot commented 9 years ago

[@quintesse]

I disagree, and I don't see how it follows.

Well if they're not part of the normal module repositories how will they be found?

(Unless they can only be imported by other scripts in the same folder.)

CeylonMigrationBot commented 9 years ago

[@PhiLhoSoft] "An open question is whether we allow a script to be split across multiple .ceylon files" I guess this will be requested sooner or later, to allow using script libraries and for readability / maintenance / reusability. Even a small scripting language like Lua allows this, via dofile() or the more formal require().

CeylonMigrationBot commented 9 years ago

[@quintesse]

"An open question is whether we allow a script to be split across multiple .ceylon files" I guess this will be requested sooner or later

I'm pretty sure you're right people will request this, but I'm not sure we should go that way. I'd rather see the scripting files as the "glue" for proper modules. So they're really easy to use and fast to write needing no setup of folder structures or anything like that and maybe even perfect for embedding. But the moment you start re-using code from other scripts IMO you're far better of writing a proper module for that.

An the other hand I think I wouldn't mind too much if it was a C-like "#include" at the level of the parser or something like that. Something that would only be allowed in script files. (for me affirming the whole idea that they're nameless and module-less)

CeylonMigrationBot commented 9 years ago

[@PhiLhoSoft] That's the dofile() way, then... It does a bit more than #include, but that's the idea, without having to manage paths, versions, dependencies, etc.

And indeed, if the need for reusability arises, one can always use "real" Ceylon code for that.

CeylonMigrationBot commented 9 years ago

[@gavinking]

Well if they're not part of the normal module repositories how will they be found?

Well the compiled artifacts would go into repos. But the script.ceylon file would be specified on the command line.

CeylonMigrationBot commented 9 years ago

[@quintesse]

Well the compiled artifacts would go into repos

But which repository? If a script file can be anywhere on your system? (Unless you need to specify it with an option, but that doesn't seem like a good default) And what about having several script files all being nameless (going into the default module)? How do we distinguish them?

CeylonMigrationBot commented 9 years ago

[@jvasileff]

Another option would be a new folder, something like ~/.ceylon/compiled-scripts and then match to the exact location of the script on disk somehow (eg. ~/.ceylon/compiled-scripts/home/user/ceylon/bin/example.ceylon/default.car).

That sounds compelling, but I can imagine a ton of ways it could go wrong; filesystem paths are far from straightforward and reliable.

So, what about using a shasum of the script file's contents, like ~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.car? Perhaps a sanitized version of the script's filename could be included as a prefix to make the filenames somewhat human readable.

Personally I think the first option of generating the .car next to the script file is the most flexible, perhaps with a fallback to the ~/.ceylon/compiled-scripts option when you don't have write access to the folder.

I agree. That seems like the safest and easiest to use approach, despite the littering of the filesystem. The car should be hidden and named with a pattern that is easy to add to an ignore file.

CeylonMigrationBot commented 9 years ago

[@quintesse]

So, what about using a shasum of the script file's contents

That's certainly an option, but imagine the mess that will make while developing, you'd get a new version for each change you make.

but I can imagine a ton of ways it could go wrong

What ways do you see this going wrong? One of the things I could think off is that you might hit filesystem limits (trying to create a deeply nested folder inside a deeply nested folder for example). But maybe an option is to use not an shasum of the code but of the path?

CeylonMigrationBot commented 9 years ago

[@jvasileff]

new version for each change you make

Good point.

What ways do you see this going wrong?

Well, perhaps it's a solvable problem, but my inclination is always to avoid using user-input as filenames, if possible.

I'm thinking of length limits, as you mentioned, and also issues with canonicalization of unicode characters in filenames across filesystems of different types (see rsync --iconv), different filename rules for different filesystem types (extfs, hfs, ntfs, fat, etc.), translations performed by network filesystems (nfs, smb, etc), possible bugs determining path separators, symlink normalizations, unstable or reusable mount points, especially for removable media, and who knows what else.

Some of this may be FUD, but like I said, it's a mess I usually try to avoid.

shasum of the code but of the path

That sounds interesting. For safety, I think it would make sense to validate the shasum of the code as well (somehow) when the car is not stored in the same directory. Timestamps may not be a sufficient guarantee that you are actually running the right binary.

Generally, I think it's wrong to trust a path as a universal identifier. Perhaps things like concat(filesystem-guid, inode) when supported work, but even that's defeated by cloning a disk with dd.

CeylonMigrationBot commented 9 years ago

[@quintesse]

I'm thinking of ...

Ok, that's indeed all true, so indeed (as expected) trying to recreate the path would probably cause all kinds of headaches. So the shasum of the final URI or something like that would probably be enough. That you won't be able to easily match a script name to a file in the cache I don't really care that much about. If really needed we could probably make it an option of the script command to tell you where the compiled version is located.

I think it would make sense to validate the shasum of the code as well

It's certainly a possibility, but I'm not sure it's worth it, somebody would really have to jump through some hoops to get the wrong code to execute it seems to me.

CeylonMigrationBot commented 9 years ago

[@HenningB] I'd prefer not to base the shasum on the path/URI/URL. One pattern I use whilst developing/hacking is to copy a file or directory (as backup), experiment with the original, throw it away, and then renaming the backup to the original name. Needing to remember to clean the script-cache would lead to confusion.

CeylonMigrationBot commented 9 years ago

[@jvasileff] Could use the sha of the file contents for the car file name/PK, and the path as part of a cache eviction strategy.

On Apr 28, 2015, at 2:01 PM, Henning Burdack notifications@github.com<mailto:notifications@github.com> wrote:

I'd prefer not to base the shasum on the path/URI/URL. One pattern I use whilst developing/hacking is to copy a file or directory (as backup), experiment with the original, throw it away, and then renaming the backup to the original name. Needing to remember to clean the script-cache would lead to confusion.

Reply to this email directly or view it on GitHub<#3306#issuecomment-97154496>.

CeylonMigrationBot commented 9 years ago

[@quintesse]

and the path as part of a cache eviction strategy.

How that would work? For me the opposite would seem to be the more logical way: to find it you'd use the hashed URI/Path and then to know if you'd need to update it you could look at the hashed contents.

@HenningB if we don't go for option 1 (local next to source) + option 5 (shasummed paths) as a fallback, but solely for option 5 then that might be a problem yes.

CeylonMigrationBot commented 9 years ago

[@jvasileff] I guess you could use $ContentsHash-$PathHash.car which would allow either one-to-many or one-to-one binary-to-sourceFile, depending on which way you want to go. (In the one-to-many scenario, executing wouldn't necessarily require a match on the $PathHash part).

ghost commented 8 years ago

It has been a while since I have read all this, but I have since come up with a proposal:

in regular modules, module initializers are placed inside module.ceylon, right below the module descriptor:

module my.\imodule "1.2"
{
    import foo.bar "2.1";
}

print("This runs when `my.module` is loaded!");

The default-module’s module descriptor is anonymous:

module
{
    import foo.bar "2.1";
}

Scripts are simply one-file modules
Scripts can be named like “foo.bar/2.1” in order to be imported by other scripts
import my.script "1.0" inside a script will import scripts from the importer’s path, from PATH, and will also be able to import modules normally.

This allows script to be named, or to be in the default module. Only named scripts can be imported.

quintesse commented 8 years ago

@Zambonifofex those are good ideas, pretty similar to what I had in mind myself. But the devil is in the details, it's not that easy to make it all work.

But I do think there are a couple of things that we could do as a preparation for this that would at least make it possible to experiment with possible solutions:

make it possible for the default module to have imports
allow module descriptor and code in a single file

Both of these are grammar/typechecker changes. But with them in place we could at least start with some PoC to see what works and what not.

ghost commented 8 years ago

@quintesse maybe we should open different threads for these features…

ghost commented 8 years ago

@quintesse is there currently anyone working on these features?

quintesse commented 8 years ago

@zambonifofex not that I know of, no

xkr47 commented 7 years ago

@lucono commented on May 10, 2016 on ceylon/ceylon-spec#200

That sounds interesting. For safety, I think it would make sense to validate the shasum of the code as well (somehow) when the car is not stored in the same directory. Timestamps may not be a sufficient guarantee that you are actually running the right binary.

I agree with this opinion, but even also for when the file is stored in the same directory, as a way of verifying that it has not been updated since it was last compiled.

Could use the sha of the file contents for the car file name/PK, and the path as part of a cache eviction strategy.

How that would work? For me the opposite would seem to be the more logical way: to find it you'd use the hashed URI/Path and then to know if you'd need to update it you could look at the hashed contents.

This makes sense to me.

Even if storing the compiled .car locally next to the script file, I think it would make sense to compare a fresh hash of the script contents to the hash of the original script from which the existing .car was compiled, to know if re-compilation is required prior to execution.

For scripts stored in the scripts cache location because the user does not have write access to the local directory of the script, some form of the hash of the URI/path of the script file could be used as already proposed, to locate the right .car in the cache, after which the hash of the script contents would still need to be compared to determine if re-compilation is required prior to execution.

Could be something like:

~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.car
~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.md5

Unless it's possible to also bundle the hash of the original source script into the .car file, making the separate .md5 file unnecessary.

ghost commented 7 years ago

@xkr47 Oracle already solved this problem with Nashorn (they compile to class files, which are hashed and cached.) A JVM flag can override the directory.

http://hg.openjdk.java.net/jdk8u/jdk8u-dev/nashorn/file/tip/docs/DEVELOPER_README

Maybe some good ideas there. It's know to work :)

xkr47 commented 7 years ago

(@pureconfig the comment was by @lucono)

ghost commented 7 years ago

Still

HenningB commented 7 years ago

Here I'm summing up the ideas brought forth for making Ceylon scriptable, and adding my own thoughts. I believe this would be now close of being ripe for implementation, as we now have namespaces and native implemented, which solve some issues. But you guys might disagree anyway.

The points are:

Files should have the extension .ceylonscript or .ceylons for Windows support (I prefer .ceylonscript).
The annotation native on module determines the runtime it can be compiled to.
Compiled code goes into ~/.ceylon/compiled-scripts/${sha-code}/default.car (or default.js).
The SHA sum is: URL + file contents + compiler git id + SHA sums of script imported files
If a compilation to multiple platforms is possible, the actual runtime is determined by an option --runtime=jvm or by an entry in .ceylon/config.
Import of other script files have the import namespace script: and an URL where to find the source.
Version number of script imports are non-existent, if that's possible with the typechecker.
Old modules are purged from the cache, where the maximum number or age of compiled modules is determined by ~/.ceylon/config.
A file in ~/.ceylon/compiled-scripts/${sha-code}/ holds some meta-data of last access time (for purging) and the original location of the source.
The user can choose to use @lucaswerkmeister's ceylond for executing the script by adding #!/bin/ceylond script.

As it would be possible to compile to JVM, JS, and Dart I would prefer a ~/.ceylon/compiled-scripts/ with SHAs over compiled modules beside the source files.

One thing I'm not sure about, is how to handle script imports: Should identifiers of an included script-module be automatically be available? Or should an identifier import be required (but then an alias in the module import is needed)? And should identifiers automatically be shared?

Based on @gavinking's last syntax proposal but with added natives and script: import namespace, a Ceylon script file would look like this:

#!/bin/ceylond script

native("jvm", "js") module;
import ceylon.collection "1.3.3";
native("jvm") import java.base "7";
native("jvm") import maven:"org.hibernate:hibernate-core" "5.0.4.Final";
native("js") import nvm:"typescript" "2.2.2";
import script:"file:./include/printall.ceylonscript"; // No Version info. Possibly we need an alias here.

import ceylon.collection { ArrayList }
// Either printAll() is automatically imported, or the user needs to declare it here.

value list = ArrayList { "hello", "world" };
printAll(list);

I don't know enough of Gradle's internals, but it would be nice if Ceylon scripting could supplement Groovy and Kotlin as language of choice for Gradle builds. Anyone has thoughts on that?

In my opinion a REPL is not really useful, but with the syntax above it might be possible. I could be wrong, but Java classloading doesn't look too complicated to me for throwaway-classes. I probably don't know enough about module classloading, but I've shoehorned a flat classloader on top of modules for ceylon.build, and it worked.

For the REPL, once we settle for a throw-away package name (for example _repl_), every code snippet goes into its own compilation unit named _generated_${number} that has the function [String,Anything] _execute_${number}(Map<String,Anything> values) { } (or it's actual function or class name), is converted to byte-code, loaded by the classloader, and optionally executed.

I might have missed something, but for me the hard part would be parsing, determining code boundaries, and value management, but not classloading itself.

eclipse-archived / ceylon

Make Ceylon scriptable #3306