Open CeylonMigrationBot opened 12 years ago
[@lucaswerkmeister] @RossTate Well, if we ditch the REPL idea, then the script runner could just compile the program completely and then launch it in a new JVM / node.js instance, right? Which means that the main work would then be defining and implementing the new syntax instead of working classloader magic.
[@gavinking] @lucaswerkmeister Right. There's no real problem with defining new types in a script. A REPL is a different beast, however, and the JVM was simply not designed with that usecase in mind.
[@gavinking]
If it's really a need I'd be happy to write the code for that.
I think you're underestimating just how hard this would be. There's no easy way to redefine a class in the JVM. You would need to do some pretty tricky messing about with classloaders to make it work.
OTOH, arguing against myself: a REPL based on node.js would probably not be hard to implement. This raises the question of what VM are we primarily targeting here:
[@djadmin] Well, I'm quite comfortable with Node. So I'd suggest Node first and then we can proceed to target JVM. What you say ?
[@gavinking] Well, thinking it through, I realize that a script for Node is not especially useful since stuff like interacting with the filesystem is much more difficult than with the JVM.
[@lucaswerkmeister] I’m a bit confused as to what exactly “Make Ceylon scriptable” involves. What’s the identifying feature of a Ceylon script? What, besides convenient invocation, separates it from ceylon compile --src . file.ceylon; ceylon run default
? With the void { ... }
syntax, we would already be far away from other scripting languages like Python, Bash, etc., all of which allow top-level statements.
[@lucaswerkmeister]
interacting with the filesystem is much more difficult than with the JVM.
Maybe “node” and “browser” should be considered different backends? ceylon.file
makes no sense for browsers, but you could implement it for node.js.
[@lucaswerkmeister] And by the way, another thing that’s IMO even more necessary than file I/O is the ability to call processes – in fact, we might even want to introduce some syntax sugar for that (possibly even for something like Bash’s $(command)
, obtaining a command’s output as a string), and import ceylon.process
automatically if it’s required.
[@gavinking]
I’m a bit confused as to what exactly “Make Ceylon scriptable” involves. What’s the identifying feature of a Ceylon script? What, besides convenient invocation, separates it from
ceylon compile --src . file.ceylon; ceylon run default
?
Well, first and foremost, it means I can just run a source file from the Unix command line, without needing to compile it first. We need to make #!/usr/bin/ceylon script
work.
It also means letting me specify module dependencies within the same source file as my executable code.
And I guess it also means letting me having some runnable code that doesn't have an explicit function name.
[@lucaswerkmeister] For the second and third point, are we still on module { ... }
, void { ... }
, or did we decide on having top-level statements and module imports?
[@quintesse]
We need to make #!/usr/bin/ceylon script work. It also means letting me specify module dependencies within the same source file as my executable code.
We need to think this through though because there are a couple of things that I'm not sure about:
default.car
?ceylon compile
and ceylon run
, but for scripts, especially the ones with hashbangs this is not defined. (python and their ilk store the compile .pyc
next to the .py
source file, but I don't think that's a desirable, or even possible, option in our case)[@lucaswerkmeister] I always thought the scripts would be independent of each other, and the compiled module would be deleted immediately after execution finishes. Therefore:
$scriptFileName.default.car
[@quintesse]
and the compiled module would be deleted immediately after execution finishes
Well but that would make them horribly inefficient, I don't see anyone using them that way.
Independent, so no (and how would that work out if they import different versions?)
That might be, but if they are regular .ceylon
files somebody might still type ceylon compile default
and if so what would the result be? (Which was one of the reasons I suggested giving Ceylon script files their own, different, file extension. They would never be compiled together and always result in separate modules)
Next to the source file?
Well unless you delete them afterwards as you suggest that could become terribly messy with .car files intermingled with source files.
The advantage of having them there next to the source file is that you're bound to throw away the car if you remove the script, while if we would have some global cache (perhaps based on the actual file location of the script) you'll end up with a trash heap of old modules that never gets cleaned. (although perhaps the current cache has some of the same problems)
[@quintesse] Also consider how you will be developing them. You still want to use Eclipse to write them and maybe they are part of a larger project where you have a couple of scripts that give easy access to some functionality.
Now where do I put them?
If I put them in the source
folder so Eclipse treats them correctly they will be compiled together into the default.car
, not what I want and possibly even results in unnecessary errors.
I'd have to put each of them in their own project just to be able to edit them.
I think that if we go for a separate extension, for argument's sake let's say .cys
, both Eclipse and the command line compiler can treat it differently. In Eclipse you'd be able to edit it even outside official source
folders and it would not compile it to a .car. And the command line compiler would just ignore them because they are handled by ceylon script
, not by ceylon compile
.
Maybe there are other ways too, it's just a suggestion/idea of mine.
[@PhiLhoSoft] Side note about REPL with compiled OO languages on the JVM: Scala has a REPL that doesn't seem to be mentioned here. I used it very occasionally, and it was mostly for quick tests. Some Scala users seem to highly praise such tool, using it a lot. What I recall from the mailing list threads about it:
My PoV is that such tool is valuable when you have not a decent IDE support, or for people allergic to such IDE, preferring coding in their editor (often vi!) and compiling from the command line... Other times, other habits / tastes...
Like Gavin, I prefer to type some lines in the IDE and have immediate feedback, auto-completion and help tips, debugging, persistence of the experiment, etc.
[@PhiLhoSoft] I have a naive question, that doesn't seem to have been asked here: What is the interest to have a Ceylon "interpreter"?
A scripting language, IMHO, should be lightweight and with a very little overhead: it must start in less than one second. And, of course, it must have a high integration with the system, being able to run commands, to capture their output, perhaps to feed some input (pipes, etc.), etc.
If it is just to have a Ceylon program compiled on the fly and then run, it offers little advantages over a two step process that can be done with some shell script.
On the other hand, if the Ceylon scripting mechanism can be made JSR-223 compliant, it can offer interesting applications, like having a Ceylon application able to run Ceylon scripts to customize it dynamically.
[@gavinking] Since @zoek1 is going to start work on this issue, I think it's time to finally make some decisions on this issue.
I propose that:
.car
or .js
in a module repo), but usually we just distribute them as plain .ceylon
source filesThe syntax of a script is:
module
statement,import
statements,import
statement,For example:
module script;
import ceylon.collection "1.1.1";
import ceylon.collection { ArrayList }
value list = ArrayList { "hello", "world" };
printAll(list);
This script, when compiled, produces a module named script
with a shared
toplevel function named run()
.
An open question is whether we allow a script to be split across multiple .ceylon
files, or if we force you to put everything in one source file.
[@lucaswerkmeister] Why is the module
statement required?
[@gavinking] @lucaswerkmeister So that the parser can distinguish that this is a script.
[@gavinking] P.S. I'm OK with allowing just:
module;
print("hello world");
if you're happy for your script to execute in the default module.
[@lucaswerkmeister] So a module statement is
ModuleStatement: Annotations "module" FullPackageName? ";"
and everything else is unchanged, except that module imports are now toplevel. (Fine by me, feels a bit more “script-y” than wrapping them in the module descriptor.)
:+1:
On semantics: I assume the typechecker checks the script just like a function body, in that “toplevel” declarations must work top-down and can’t be circular?
[@gavinking]
I assume the typechecker checks the script just like a function body, in that “toplevel” declarations must work top-down and can’t be circular?
Precisely. I briefly thought of making it work like the body of a class (initializer then declarations) but then I realized that there would be no point, since there's nothing to call the declarations.
[@quintesse] But we'll need to "enforce" the uniqueness requirement of the module name somehow, right? Especially when you allow them to be default modules as well.
So probably the real question is: where do we store the .cars for those compiled scripts?
foo.pyc
compiled file beside the foo.py
file, so we could create an example.car
next to the example.ceylon
script. Of course that would require you to always have write access to the folder where the source file is located.
(It does make "clean-up" easier, if you remove the source file you probably won't forget to delete the compiled version.)./modules
as we do now but I have the feeling people don't like to see their (non-ceylon) folders littered with seemingly unrelated files/folders. A variant of this would be to compile scripts to ./.ceylon/modules
effectively hiding the output folder from sight. But we'd still have the problem of what to do with multiple "default" scripts in the same folder. (And clean-up is more difficult though in both these cases.)~/.ceylon/cache
) but we don't allow default modules there (and again how to distinguish between them?) so that's not really an option.~/.ceylon/compiled-scripts
and then match to the exact location of the script on disk somehow (eg. ~/.ceylon/compiled-scripts/home/user/ceylon/bin/example.ceylon/default.car
).(The problem with those last two options is also the clean-up , they might fill up with crap pretty quickly if you use a lot of scripts just a couple of times. The folder itself is pretty easy to delete though.)
Personally I think the first option of generating the .car next to the script file is the most flexible, perhaps with a fallback to the ~/.ceylon/compiled-scripts
option when you don't have write access to the folder.
[@quintesse] Oh btw, as side-effect of the above, where compiled scripts are not really stored in the normal modules repositories, I think scripts should always be default modules: they can import but can never be imported themselves.
[@quintesse] Unless we also want the script source syntax available for normal Ceylon modules, as this really quick to write, single-file module. That would be great, but it would then have to follow all the normal Ceylon source rules: be in a source
folder, be compiled with ceylon compile
etc.
In the above messages I'm referring to Ceylon scripts as used in system scripting, eg. being able to do:
example.ceylon
#!/bin/ceylon run
module;
print("hello world"):
and run it with:
$ ./example.ceylon
hello world
[@gavinking]
Oh btw, as side-effect of the above, where compiled scripts are not really stored in the normal modules repositories, I think scripts should always be default modules: they can import but can never be imported themselves.
I disagree, and I don't see how it follows.
[@quintesse]
I disagree, and I don't see how it follows.
Well if they're not part of the normal module repositories how will they be found?
(Unless they can only be imported by other scripts in the same folder.)
[@PhiLhoSoft] "An open question is whether we allow a script to be split across multiple .ceylon files" I guess this will be requested sooner or later, to allow using script libraries and for readability / maintenance / reusability. Even a small scripting language like Lua allows this, via dofile() or the more formal require().
[@quintesse]
"An open question is whether we allow a script to be split across multiple .ceylon files" I guess this will be requested sooner or later
I'm pretty sure you're right people will request this, but I'm not sure we should go that way. I'd rather see the scripting files as the "glue" for proper modules. So they're really easy to use and fast to write needing no setup of folder structures or anything like that and maybe even perfect for embedding. But the moment you start re-using code from other scripts IMO you're far better of writing a proper module for that.
An the other hand I think I wouldn't mind too much if it was a C-like "#include" at the level of the parser or something like that. Something that would only be allowed in script files. (for me affirming the whole idea that they're nameless and module-less)
[@PhiLhoSoft] That's the dofile() way, then... It does a bit more than #include, but that's the idea, without having to manage paths, versions, dependencies, etc.
And indeed, if the need for reusability arises, one can always use "real" Ceylon code for that.
[@gavinking]
Well if they're not part of the normal module repositories how will they be found?
Well the compiled artifacts would go into repos. But the script.ceylon
file would be specified on the command line.
[@quintesse]
Well the compiled artifacts would go into repos
But which repository? If a script file can be anywhere on your system? (Unless you need to specify it with an option, but that doesn't seem like a good default) And what about having several script files all being nameless (going into the default module)? How do we distinguish them?
[@jvasileff]
Another option would be a new folder, something like ~/.ceylon/compiled-scripts and then match to the exact location of the script on disk somehow (eg. ~/.ceylon/compiled-scripts/home/user/ceylon/bin/example.ceylon/default.car).
That sounds compelling, but I can imagine a ton of ways it could go wrong; filesystem paths are far from straightforward and reliable.
So, what about using a shasum of the script file's contents, like ~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.car
? Perhaps a sanitized version of the script's filename could be included as a prefix to make the filenames somewhat human readable.
Personally I think the first option of generating the .car next to the script file is the most flexible, perhaps with a fallback to the ~/.ceylon/compiled-scripts option when you don't have write access to the folder.
I agree. That seems like the safest and easiest to use approach, despite the littering of the filesystem. The car
should be hidden and named with a pattern that is easy to add to an ignore file.
[@quintesse]
So, what about using a shasum of the script file's contents
That's certainly an option, but imagine the mess that will make while developing, you'd get a new version for each change you make.
but I can imagine a ton of ways it could go wrong
What ways do you see this going wrong? One of the things I could think off is that you might hit filesystem limits (trying to create a deeply nested folder inside a deeply nested folder for example). But maybe an option is to use not an shasum of the code but of the path?
[@jvasileff]
new version for each change you make
Good point.
What ways do you see this going wrong?
Well, perhaps it's a solvable problem, but my inclination is always to avoid using user-input as filenames, if possible.
I'm thinking of length limits, as you mentioned, and also issues with canonicalization of unicode characters in filenames across filesystems of different types (see rsync --iconv
), different filename rules for different filesystem types (extfs, hfs, ntfs, fat, etc.), translations performed by network filesystems (nfs, smb, etc), possible bugs determining path separators, symlink normalizations, unstable or reusable mount points, especially for removable media, and who knows what else.
Some of this may be FUD, but like I said, it's a mess I usually try to avoid.
shasum of the code but of the path
That sounds interesting. For safety, I think it would make sense to validate the shasum of the code as well (somehow) when the car
is not stored in the same directory. Timestamps may not be a sufficient guarantee that you are actually running the right binary.
Generally, I think it's wrong to trust a path as a universal identifier. Perhaps things like concat(filesystem-guid, inode) when supported work, but even that's defeated by cloning a disk with dd
.
[@quintesse]
I'm thinking of ...
Ok, that's indeed all true, so indeed (as expected) trying to recreate the path would probably cause all kinds of headaches. So the shasum of the final URI or something like that would probably be enough. That you won't be able to easily match a script name to a file in the cache I don't really care that much about. If really needed we could probably make it an option of the script command to tell you where the compiled version is located.
I think it would make sense to validate the shasum of the code as well
It's certainly a possibility, but I'm not sure it's worth it, somebody would really have to jump through some hoops to get the wrong code to execute it seems to me.
[@HenningB] I'd prefer not to base the shasum on the path/URI/URL. One pattern I use whilst developing/hacking is to copy a file or directory (as backup), experiment with the original, throw it away, and then renaming the backup to the original name. Needing to remember to clean the script-cache would lead to confusion.
[@jvasileff] Could use the sha of the file contents for the car file name/PK, and the path as part of a cache eviction strategy.
On Apr 28, 2015, at 2:01 PM, Henning Burdack notifications@github.com<mailto:notifications@github.com> wrote:
I'd prefer not to base the shasum on the path/URI/URL. One pattern I use whilst developing/hacking is to copy a file or directory (as backup), experiment with the original, throw it away, and then renaming the backup to the original name. Needing to remember to clean the script-cache would lead to confusion.
Reply to this email directly or view it on GitHub<#3306#issuecomment-97154496>.
[@quintesse]
and the path as part of a cache eviction strategy.
How that would work? For me the opposite would seem to be the more logical way: to find it you'd use the hashed URI/Path and then to know if you'd need to update it you could look at the hashed contents.
@HenningB if we don't go for option 1 (local next to source) + option 5 (shasummed paths) as a fallback, but solely for option 5 then that might be a problem yes.
[@jvasileff] I guess you could use $ContentsHash-$PathHash.car
which would allow either one-to-many or one-to-one binary-to-sourceFile, depending on which way you want to go. (In the one-to-many scenario, executing wouldn't necessarily require a match on the $PathHash
part).
It has been a while since I have read all this, but I have since come up with a proposal:
module.ceylon
, right below the module descriptor:module my.\imodule "1.2"
{
import foo.bar "2.1";
}
print("This runs when `my.module` is loaded!");
module
{
import foo.bar "2.1";
}
foo.bar/2.1
” in order to be imported by other scriptsimport my.script "1.0"
inside a script will import scripts from the importer’s path, from PATH, and will also be able to import modules normally.This allows script to be named, or to be in the default module. Only named scripts can be imported.
@Zambonifofex those are good ideas, pretty similar to what I had in mind myself. But the devil is in the details, it's not that easy to make it all work.
But I do think there are a couple of things that we could do as a preparation for this that would at least make it possible to experiment with possible solutions:
Both of these are grammar/typechecker changes. But with them in place we could at least start with some PoC to see what works and what not.
@quintesse maybe we should open different threads for these features…
@quintesse is there currently anyone working on these features?
@zambonifofex not that I know of, no
@lucono commented on May 10, 2016 on ceylon/ceylon-spec#200
That sounds interesting. For safety, I think it would make sense to validate the shasum of the code as well (somehow) when the car is not stored in the same directory. Timestamps may not be a sufficient guarantee that you are actually running the right binary.
I agree with this opinion, but even also for when the file is stored in the same directory, as a way of verifying that it has not been updated since it was last compiled.
Could use the sha of the file contents for the car file name/PK, and the path as part of a cache eviction strategy.
How that would work? For me the opposite would seem to be the more logical way: to find it you'd use the hashed URI/Path and then to know if you'd need to update it you could look at the hashed contents.
This makes sense to me.
Even if storing the compiled .car locally next to the script file, I think it would make sense to compare a fresh hash of the script contents to the hash of the original script from which the existing .car was compiled, to know if re-compilation is required prior to execution.
For scripts stored in the scripts cache location because the user does not have write access to the local directory of the script, some form of the hash of the URI/path of the script file could be used as already proposed, to locate the right .car in the cache, after which the hash of the script contents would still need to be compared to determine if re-compilation is required prior to execution.
Could be something like:
~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.car
~/.ceylon/compiled-scripts/db32869dcc7cc6762af350447c0a19037792c5eb.md5
Unless it's possible to also bundle the hash of the original source script into the .car file, making the separate .md5 file unnecessary.
@xkr47 Oracle already solved this problem with Nashorn (they compile to class files, which are hashed and cached.) A JVM flag can override the directory.
http://hg.openjdk.java.net/jdk8u/jdk8u-dev/nashorn/file/tip/docs/DEVELOPER_README
Maybe some good ideas there. It's know to work :)
(@pureconfig the comment was by @lucono)
Still
Here I'm summing up the ideas brought forth for making Ceylon scriptable, and adding my own thoughts. I believe this would be now close of being ripe for implementation, as we now have namespaces and native
implemented, which solve some issues. But you guys might disagree anyway.
The points are:
.ceylonscript
or .ceylons
for Windows support (I prefer .ceylonscript
).native
on module
determines the runtime it can be compiled to.~/.ceylon/compiled-scripts/${sha-code}/default.car
(or default.js
).--runtime=jvm
or by an entry in .ceylon/config
.import
namespace script:
and an URL where to find the source.~/.ceylon/config
.~/.ceylon/compiled-scripts/${sha-code}/
holds some meta-data of last access time (for purging) and the original location of the source.ceylond
for executing the script by adding #!/bin/ceylond script
.As it would be possible to compile to JVM, JS, and Dart I would prefer a ~/.ceylon/compiled-scripts/
with SHAs over compiled modules beside the source files.
One thing I'm not sure about, is how to handle script imports: Should identifiers of an included script-module be automatically be available? Or should an identifier import
be required (but then an alias in the module import
is needed)? And should identifiers automatically be share
d?
Based on @gavinking's last syntax proposal but with added native
s and script:
import namespace, a Ceylon script file would look like this:
#!/bin/ceylond script
native("jvm", "js") module;
import ceylon.collection "1.3.3";
native("jvm") import java.base "7";
native("jvm") import maven:"org.hibernate:hibernate-core" "5.0.4.Final";
native("js") import nvm:"typescript" "2.2.2";
import script:"file:./include/printall.ceylonscript"; // No Version info. Possibly we need an alias here.
import ceylon.collection { ArrayList }
// Either printAll() is automatically imported, or the user needs to declare it here.
value list = ArrayList { "hello", "world" };
printAll(list);
I don't know enough of Gradle's internals, but it would be nice if Ceylon scripting could supplement Groovy and Kotlin as language of choice for Gradle builds. Anyone has thoughts on that?
In my opinion a REPL is not really useful, but with the syntax above it might be possible. I could be wrong, but Java classloading doesn't look too complicated to me for throwaway-classes. I probably don't know enough about module classloading, but I've shoehorned a flat classloader on top of modules for ceylon.build
, and it worked.
For the REPL, once we settle for a throw-away package name (for example _repl_
), every code snippet goes into its own compilation unit named _generated_${number}
that has the function [String,Anything] _execute_${number}(Map<String,Anything> values) { }
(or it's actual function or class name), is converted to byte-code, loaded by the classloader, and optionally executed.
I might have missed something, but for me the hard part would be parsing, determining code boundaries, and value management, but not classloading itself.
[@FroMage] Needs some thinking
[Migrated from ceylon/ceylon-spec#200]