crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.26k stars 1.61k forks source link

Unique id for ephemeral binaries #8340

Open bkao opened 4 years ago

bkao commented 4 years ago

Would it be possible to name the ephemeral binaries in .cache with some type of unique id? I don't want to compile a permanent binary and prefer to use it like a scripting language, but when I do this in a MapReduce framework each instance of the script compiles to the same filename causing it to crash if two or more instances are simultaneously writing the binary.

I like to do something like this where myprog.cr has your typical shebang line: "#!/usr/bin/env crystal":

cat file.json | myprog.cr --release > out

Which generates this binary: ~/.cache/crystal/crystal-run-myprog.tmp

Would be nice if the binaries were named with unique id's like: ~/.cache/crystal/crystal-run-myprog-\.tmp

This way each instance in my MapReduce multi-process framework gets its own file.

jkthorne commented 4 years ago

is this what you are looking for?

Edit: https://github.com/Val/crun

oprypin commented 4 years ago

is what? i don't get it

also is there a bit missing from the original post? maybe angle brackets after the dash in the file?

both my comments have been resolved by corresponding edits

asterite commented 4 years ago

Yes, sorry, I also don't understand the issue.

But note that right now running the compiler twice in parallel doesn't work well.

bkao11 commented 4 years ago

But note that right now running the compiler twice in parallel doesn't work well.

Yes because the filenames clash.

It looks like the crun solution from @wontruefree would do the trick. I just wish it were more tightly integrated. Also, I'm not sure if it's possible to pass the '--release' option to crystal through the shebang line.

NIFR91 commented 4 years ago

I recently found the same issue, in my case i have a program that process some text, for example extract some lines or columns. But I wanted to pipe in parallel the program for example

cat "1 2 3\n4 5 6\n" | ./myprog.cr extract-first-line | ./myprog.cr get-first-col

But some times the second compilation clashes with the first and we get the error

execvp (/home/nieto/.cache/crystal/crystal-run-writter.tmp): Text file busy: Text file busy (Errno)
execvp (/home/nieto/.cache/crystal/crystal-run-writter.tmp): No such file or directory: No such file or directory (Errno)
  from ???
  ...
  from ???
Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues
  from ???
  ...
  from ???
Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues

I also think it would be nice to have the crystal compiler handle this cases, so the user wont need to install crun.

minimal program
# program.cr 

#!/usr/bin/env crystal 
while line = gets
  puts line 
end
echo "Hello\nWorld\n" | program.cr | program.cr 
bew commented 4 years ago

Simpler:

#!/usr/bin/env crystal
sleep 1

Then foo.cr | foo.cr

asterite commented 4 years ago

My suggestion: don't use crystal a scripting language. Compile the program to a binary. Then it'll be faster (no need to wait for compilation) and you won't have this "compiler is running twice" problem.

bkao commented 4 years ago

Yes compilation would render the problem moot, but there are times when I don't want to deal with separate source and binary files. This is one of the nice features of scripting languages. If crystal can already behave like a scripting language, why not go all the way solve this filename collision problem.

Crun would work, but it's a work-around, not a genuine solution. Perhaps if it were more tightly integrated or simply subsumed into crystal then we'd basically be taking it all the way in terms of behaving like a scripting language -- I think, since I haven't looked into the code path of when crystal is called via the she-bang.

Ideally crystal scripting language would behave like make and only recompile if any of the source files are newer than the executable. This would be an improvement, no? Maybe crun is doing that but like @NIFR91 said, it would be nice to bypass yet-another-dependency. Plus I don't know if it's possible to pass compiler arguments like '--release' via crun.

NIFR91 commented 4 years ago

I agree with @bkao also we have crystal run which is the default behavior, its very useful and naturally leads to using it for scripting (this is one of the reasons i really like the language). Making the user keep track of the binary when Crystal could have a integrated tool like crun (that can be used in replacement to run) makes crystal feel more like C than Crystal in this aspect. In my opinion going all the way into make crystal behave like a scripting language could be beneficial as it could be used a little more for simpler programs-scripts (the ease of use of a scripting language and raw performance of Crystal is a very compelling combination hence this issue and crun like shards.

RX14 commented 4 years ago

We can exclusively lock the compiler cache directory while the compiler is using it, and atomically replace the output file after linking (don't write it in place). Crystal already has flock bound, but it might have to be added to dirfds.

rdp commented 4 years ago

Maybe crystal could have a parameter like "--unique-id=x" then you could use a wrapper script (wraps crystal) for your bash shebang, though...that doesn't feel optimal somehow... @bkao the problem is if both processes are simultaneously trying to build it (and files have changed). I guess you could alternatively have your script pre-build the required binaries FWIW...crystal build foo.cr && ./foo | ./foo type of thing, FWIW?

bkao commented 4 years ago

@rdp, I don't think the actual name of the executable is the problem. Simply make one component of the filename be a hash of the source code. For example: foo.cr --> .foo.6c1a0 (temp file) --> foo.6c1a0 (final exe)

Have the crystal system first check for the hidden file indicating the compilation is in progress to prevent race conditions.

Order of operations would be something like:

  1. If final exe exists, run it
  2. If final exe doesn't exist and temp file doesn't exist, initiate compilation
  3. If final exe doesn't exist and temp file does exist, wait until compilation completes
straight-shoota commented 4 years ago

Just a general question: How would you determine the uid? I'd assume it should be some kind of hash over the source code?

Considering that programs can contain dynamic data, such as the result of other programs (as run macro) there are some consequences:

I assume you would only want to rebuild the binary when the (actual) source code has changed. That seems like a prototypical use case for a build management tool like make.

rdp commented 4 years ago

I like @RX14 idea. Maybe each crystal "output filename" could go in its own separate cache folder somehow, then compiling could flock that folder until it finishes...or something like that... :)