ceylon / ceylon-spec

DEPRECATED
Apache License 2.0
108 stars 34 forks source link

Make Ceylon scriptable #200

Open FroMage opened 12 years ago

FroMage commented 12 years ago

Needs some thinking

lucaswerkmeister commented 10 years ago

@RossTate Well, if we ditch the REPL idea, then the script runner could just compile the program completely and then launch it in a new JVM / node.js instance, right? Which means that the main work would then be defining and implementing the new syntax instead of working classloader magic.

gavinking commented 10 years ago

@lucaswerkmeister Right. There's no real problem with defining new types in a script. A REPL is a different beast, however, and the JVM was simply not designed with that usecase in mind.

gavinking commented 10 years ago

If it's really a need I'd be happy to write the code for that.

I think you're underestimating just how hard this would be. There's no easy way to redefine a class in the JVM. You would need to do some pretty tricky messing about with classloaders to make it work.

OTOH, arguing against myself: a REPL based on node.js would probably not be hard to implement. This raises the question of what VM are we primarily targeting here:

djadmin commented 10 years ago

Well, I'm quite comfortable with Node. So I'd suggest Node first and then we can proceed to target JVM. What you say ?

gavinking commented 10 years ago

Well, thinking it through, I realize that a script for Node is not especially useful since stuff like interacting with the filesystem is much more difficult than with the JVM.

lucaswerkmeister commented 10 years ago

I’m a bit confused as to what exactly “Make Ceylon scriptable” involves. What’s the identifying feature of a Ceylon script? What, besides convenient invocation, separates it from ceylon compile --src . file.ceylon; ceylon run default? With the void { ... } syntax, we would already be far away from other scripting languages like Python, Bash, etc., all of which allow top-level statements.

lucaswerkmeister commented 10 years ago

interacting with the filesystem is much more difficult than with the JVM.

Maybe “node” and “browser” should be considered different backends? ceylon.file makes no sense for browsers, but you could implement it for node.js.

lucaswerkmeister commented 10 years ago

And by the way, another thing that’s IMO even more necessary than file I/O is the ability to call processes – in fact, we might even want to introduce some syntax sugar for that (possibly even for something like Bash’s $(command), obtaining a command’s output as a string), and import ceylon.process automatically if it’s required.

gavinking commented 10 years ago

I’m a bit confused as to what exactly “Make Ceylon scriptable” involves. What’s the identifying feature of a Ceylon script? What, besides convenient invocation, separates it from ceylon compile --src . file.ceylon; ceylon run default?

Well, first and foremost, it means I can just run a source file from the Unix command line, without needing to compile it first. We need to make #!/usr/bin/ceylon script work.

It also means letting me specify module dependencies within the same source file as my executable code.

And I guess it also means letting me having some runnable code that doesn't have an explicit function name.

lucaswerkmeister commented 10 years ago

For the second and third point, are we still on module { ... }, void { ... }, or did we decide on having top-level statements and module imports?

quintesse commented 10 years ago

We need to make #!/usr/bin/ceylon script work. It also means letting me specify module dependencies within the same source file as my executable code.

We need to think this through though because there are a couple of things that I'm not sure about:

lucaswerkmeister commented 10 years ago

I always thought the scripts would be independent of each other, and the compiled module would be deleted immediately after execution finishes. Therefore:

quintesse commented 10 years ago

and the compiled module would be deleted immediately after execution finishes

Well but that would make them horribly inefficient, I don't see anyone using them that way.

Independent, so no (and how would that work out if they import different versions?)

That might be, but if they are regular .ceylon files somebody might still type ceylon compile default and if so what would the result be? (Which was one of the reasons I suggested giving Ceylon script files their own, different, file extension. They would never be compiled together and always result in separate modules)

Next to the source file?

Well unless you delete them afterwards as you suggest that could become terribly messy with .car files intermingled with source files.

The advantage of having them there next to the source file is that you're bound to throw away the car if you remove the script, while if we would have some global cache (perhaps based on the actual file location of the script) you'll end up with a trash heap of old modules that never gets cleaned. (although perhaps the current cache has some of the same problems)

quintesse commented 10 years ago

Also consider how you will be developing them. You still want to use Eclipse to write them and maybe they are part of a larger project where you have a couple of scripts that give easy access to some functionality.

Now where do I put them?

If I put them in the source folder so Eclipse treats them correctly they will be compiled together into the default.car, not what I want and possibly even results in unnecessary errors.

I'd have to put each of them in their own project just to be able to edit them.

I think that if we go for a separate extension, for argument's sake let's say .cys, both Eclipse and the command line compiler can treat it differently. In Eclipse you'd be able to edit it even outside official source folders and it would not compile it to a .car. And the command line compiler would just ignore them because they are handled by ceylon script, not by ceylon compile.

Maybe there are other ways too, it's just a suggestion/idea of mine.

PhiLhoSoft commented 9 years ago

Side note about REPL with compiled OO languages on the JVM: Scala has a REPL that doesn't seem to be mentioned here. I used it very occasionally, and it was mostly for quick tests. Some Scala users seem to highly praise such tool, using it a lot. What I recall from the mailing list threads about it:

My PoV is that such tool is valuable when you have not a decent IDE support, or for people allergic to such IDE, preferring coding in their editor (often vi!) and compiling from the command line... Other times, other habits / tastes...

Like Gavin, I prefer to type some lines in the IDE and have immediate feedback, auto-completion and help tips, debugging, persistence of the experiment, etc.

PhiLhoSoft commented 9 years ago

I have a naive question, that doesn't seem to have been asked here: What is the interest to have a Ceylon "interpreter"?

A scripting language, IMHO, should be lightweight and with a very little overhead: it must start in less than one second. And, of course, it must have a high integration with the system, being able to run commands, to capture their output, perhaps to feed some input (pipes, etc.), etc.

If it is just to have a Ceylon program compiled on the fly and then run, it offers little advantages over a two step process that can be done with some shell script.

On the other hand, if the Ceylon scripting mechanism can be made JSR-223 compliant, it can offer interesting applications, like having a Ceylon application able to run Ceylon scripts to customize it dynamically.

gavinking commented 9 years ago

Since @zoek1 is going to start work on this issue, I think it's time to finally make some decisions on this issue.

I propose that:

The syntax of a script is:

  1. a module statement,
  2. optionally, one or more module import statements,
  3. optionally, one or more package import statement,
  4. a list of statements and declarations, interpreted as if they occur in the body of a toplevel function.

For example:

module script;
import ceylon.collection "1.1.1";
import ceylon.collection { ArrayList }

value list = ArrayList { "hello", "world" };
printAll(list);

This script, when compiled, produces a module named script with a shared toplevel function named run().

An open question is whether we allow a script to be split across multiple .ceylon files, or if we force you to put everything in one source file.

lucaswerkmeister commented 9 years ago

Why is the module statement required?

gavinking commented 9 years ago

@lucaswerkmeister So that the parser can distinguish that this is a script.

gavinking commented 9 years ago

P.S. I'm OK with allowing just:

module;
print("hello world");

if you're happy for your script to execute in the default module.

lucaswerkmeister commented 9 years ago

So a module statement is

ModuleStatement: Annotations "module" FullPackageName? ";"

and everything else is unchanged, except that module imports are now toplevel. (Fine by me, feels a bit more “script-y” than wrapping them in the module descriptor.)

:+1:

On semantics: I assume the typechecker checks the script just like a function body, in that “toplevel” declarations must work top-down and can’t be circular?

gavinking commented 9 years ago

I assume the typechecker checks the script just like a function body, in that “toplevel” declarations must work top-down and can’t be circular?

Precisely. I briefly thought of making it work like the body of a class (initializer then declarations) but then I realized that there would be no point, since there's nothing to call the declarations.

quintesse commented 9 years ago

But we'll need to "enforce" the uniqueness requirement of the module name somehow, right? Especially when you allow them to be default modules as well.

So probably the real question is: where do we store the .cars for those compiled scripts?

  1. Languages like python will create a foo.pyc compiled file beside the foo.py file, so we could create an example.car next to the example.ceylon script. Of course that would require you to always have write access to the folder where the source file is located. (It does make "clean-up" easier, if you remove the source file you probably won't forget to delete the compiled version.)
  2. We could of course just compile them to ./modules as we do now but I have the feeling people don't like to see their (non-ceylon) folders littered with seemingly unrelated files/folders. A variant of this would be to compile scripts to ./.ceylon/modules effectively hiding the output folder from sight. But we'd still have the problem of what to do with multiple "default" scripts in the same folder. (And clean-up is more difficult though in both these cases.)
  3. We could also compile them to a fixed place like the cache (~/.ceylon/cache) but we don't allow default modules there (and again how to distinguish between them?) so that's not really an option.
  4. Another option would be a new folder, something like ~/.ceylon/compiled-scripts and then match to the exact location of the script on disk somehow (eg. ~/.ceylon/compiled-scripts/home/user/ceylon/bin/example.ceylon/default.car).
  5. A variant of item 4 would be to use a SHA1 summed version of the URI/Path instead of the actual URI/Path (which might contain all kinds of weird things if the script came from a network source for example)

(The problem with those last two options is also the clean-up , they might fill up with crap pretty quickly if you use a lot of scripts just a couple of times. The folder itself is pretty easy to delete though.)

Personally I think the first option of generating the .car next to the script file is the most flexible, perhaps with a fallback to the ~/.ceylon/compiled-scripts option when you don't have write access to the folder.

quintesse commented 9 years ago

Oh btw, as side-effect of the above, where compiled scripts are not really stored in the normal modules repositories, I think scripts should always be default modules: they can import but can never be imported themselves.

quintesse commented 9 years ago

Unless we also want the script source syntax available for normal Ceylon modules, as this really quick to write, single-file module. That would be great, but it would then have to follow all the normal Ceylon source rules: be in a source folder, be compiled with ceylon compile etc. In the above messages I'm referring to Ceylon scripts as used in system scripting, eg. being able to do:

example.ceylon

#!/bin/ceylon run
module;
print("hello world"):

and run it with:

$ ./example.ceylon
hello world
gavinking commented 9 years ago

Oh btw, as side-effect of the above, where compiled scripts are not really stored in the normal modules repositories, I think scripts should always be default modules: they can import but can never be imported themselves.

I disagree, and I don't see how it follows.

quintesse commented 9 years ago

I disagree, and I don't see how it follows.

Well if they're not part of the normal module repositories how will they be found?

(Unless they can only be imported by other scripts in the same folder.)

PhiLhoSoft commented 9 years ago

"An open question is whether we allow a script to be split across multiple .ceylon files" I guess this will be requested sooner or later, to allow using script libraries and for readability / maintenance / reusability. Even a small scripting language like Lua allows this, via dofile() or the more formal require().

quintesse commented 9 years ago

"An open question is whether we allow a script to be split across multiple .ceylon files" I guess this will be requested sooner or later

I'm pretty sure you're right people will request this, but I'm not sure we should go that way. I'd rather see the scripting files as the "glue" for proper modules. So they're really easy to use and fast to write needing no setup of folder structures or anything like that and maybe even perfect for embedding. But the moment you start re-using code from other scripts IMO you're far better of writing a proper module for that.

An the other hand I think I wouldn't mind too much if it was a C-like "#include" at the level of the parser or something like that. Something that would only be allowed in script files. (for me affirming the whole idea that they're nameless and module-less)

PhiLhoSoft commented 9 years ago

That's the dofile() way, then... It does a bit more than #include, but that's the idea, without having to manage paths, versions, dependencies, etc.

And indeed, if the need for reusability arises, one can always use "real" Ceylon code for that.

gavinking commented 9 years ago

Well if they're not part of the normal module repositories how will they be found?

Well the compiled artifacts would go into repos. But the script.ceylon file would be specified on the command line.

quintesse commented 9 years ago

Well the compiled artifacts would go into repos

But which repository? If a script file can be anywhere on your system? (Unless you need to specify it with an option, but that doesn't seem like a good default) And what about having several script files all being nameless (going into the default module)? How do we distinguish them?

jvasileff commented 9 years ago

Another option would be a new folder, something like ~/.ceylon/compiled-scripts and then match to the exact location of the script on disk somehow (eg. ~/.ceylon/compiled-scripts/home/user/ceylon/bin/example.ceylon/default.car).

That sounds compelling, but I can imagine a ton of ways it could go wrong; filesystem paths are far from straightforward and reliable.

So, what about using a shasum of the script file's contents, like ~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.car? Perhaps a sanitized version of the script's filename could be included as a prefix to make the filenames somewhat human readable.

Personally I think the first option of generating the .car next to the script file is the most flexible, perhaps with a fallback to the ~/.ceylon/compiled-scripts option when you don't have write access to the folder.

I agree. That seems like the safest and easiest to use approach, despite the littering of the filesystem. The car should be hidden and named with a pattern that is easy to add to an ignore file.

quintesse commented 9 years ago

So, what about using a shasum of the script file's contents

That's certainly an option, but imagine the mess that will make while developing, you'd get a new version for each change you make.

but I can imagine a ton of ways it could go wrong

What ways do you see this going wrong? One of the things I could think off is that you might hit filesystem limits (trying to create a deeply nested folder inside a deeply nested folder for example). But maybe an option is to use not an shasum of the code but of the path?

jvasileff commented 9 years ago

new version for each change you make

Good point.

What ways do you see this going wrong?

Well, perhaps it's a solvable problem, but my inclination is always to avoid using user-input as filenames, if possible.

I'm thinking of length limits, as you mentioned, and also issues with canonicalization of unicode characters in filenames across filesystems of different types (see rsync --iconv), different filename rules for different filesystem types (extfs, hfs, ntfs, fat, etc.), translations performed by network filesystems (nfs, smb, etc), possible bugs determining path separators, symlink normalizations, unstable or reusable mount points, especially for removable media, and who knows what else.

Some of this may be FUD, but like I said, it's a mess I usually try to avoid.

shasum of the code but of the path

That sounds interesting. For safety, I think it would make sense to validate the shasum of the code as well (somehow) when the car is not stored in the same directory. Timestamps may not be a sufficient guarantee that you are actually running the right binary.

Generally, I think it's wrong to trust a path as a universal identifier. Perhaps things like concat(filesystem-guid, inode) when supported work, but even that's defeated by cloning a disk with dd.

quintesse commented 9 years ago

I'm thinking of ...

Ok, that's indeed all true, so indeed (as expected) trying to recreate the path would probably cause all kinds of headaches. So the shasum of the final URI or something like that would probably be enough. That you won't be able to easily match a script name to a file in the cache I don't really care that much about. If really needed we could probably make it an option of the script command to tell you where the compiled version is located.

I think it would make sense to validate the shasum of the code as well

It's certainly a possibility, but I'm not sure it's worth it, somebody would really have to jump through some hoops to get the wrong code to execute it seems to me.

HenningB commented 9 years ago

I'd prefer not to base the shasum on the path/URI/URL. One pattern I use whilst developing/hacking is to copy a file or directory (as backup), experiment with the original, throw it away, and then renaming the backup to the original name. Needing to remember to clean the script-cache would lead to confusion.

jvasileff commented 9 years ago

Could use the sha of the file contents for the car file name/PK, and the path as part of a cache eviction strategy.

On Apr 28, 2015, at 2:01 PM, Henning Burdack notifications@github.com<mailto:notifications@github.com> wrote:

I'd prefer not to base the shasum on the path/URI/URL. One pattern I use whilst developing/hacking is to copy a file or directory (as backup), experiment with the original, throw it away, and then renaming the backup to the original name. Needing to remember to clean the script-cache would lead to confusion.

Reply to this email directly or view it on GitHubhttps://github.com/ceylon/ceylon-spec/issues/200#issuecomment-97154496.

quintesse commented 9 years ago

and the path as part of a cache eviction strategy.

How that would work? For me the opposite would seem to be the more logical way: to find it you'd use the hashed URI/Path and then to know if you'd need to update it you could look at the hashed contents.

@HenningB if we don't go for option 1 (local next to source) + option 5 (shasummed paths) as a fallback, but solely for option 5 then that might be a problem yes.

jvasileff commented 9 years ago

I guess you could use $ContentsHash-$PathHash.car which would allow either one-to-many or one-to-one binary-to-sourceFile, depending on which way you want to go. (In the one-to-many scenario, executing wouldn't necessarily require a match on the $PathHash part).

lucono commented 8 years ago

That sounds interesting. For safety, I think it would make sense to validate the shasum of the code as well (somehow) when the car is not stored in the same directory. Timestamps may not be a sufficient guarantee that you are actually running the right binary.

I agree with this opinion, but even also for when the file is stored in the same directory, as a way of verifying that it has not been updated since it was last compiled.

Could use the sha of the file contents for the car file name/PK, and the path as part of a cache eviction strategy.

How that would work? For me the opposite would seem to be the more logical way: to find it you'd use the hashed URI/Path and then to know if you'd need to update it you could look at the hashed contents.

This makes sense to me.

Even if storing the compiled .car locally next to the script file, I think it would make sense to compare a fresh hash of the script contents to the hash of the original script from which the existing .car was compiled, to know if re-compilation is required prior to execution.

For scripts stored in the scripts cache location because the user does not have write access to the local directory of the script, some form of the hash of the URI/path of the script file could be used as already proposed, to locate the right .car in the cache, after which the hash of the script contents would still need to be compared to determine if re-compilation is required prior to execution.

Could be something like:

~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.car
~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.md5

Unless it's possible to also bundle the hash of the original source script into the .car file, making the separate .md5 file unnecessary.