build error (again: ONIGURUMA/missing)

hi, first of all thanks for this exciting project, it is a real treasure trove.

unfortunately build fails:

CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/bash /tmp/dino/ONIGURUMA/missing aclocal-1.14 -I m4
/tmp/dino/ONIGURUMA/missing: line 81: aclocal-1.14: command not found
WARNING: 'aclocal-1.14' is missing on your system.
         You should only need it if you modified 'acinclude.m4' or
         'configure.ac' or m4 files included by 'configure.ac'.
         The 'aclocal' program is part of the GNU Automake package:
         <http://www.gnu.org/software/automake>
         It also requires GNU Autoconf, GNU m4 and Perl in order to run:
         <http://www.gnu.org/software/autoconf>
         <http://www.gnu.org/software/m4/>
         <http://www.perl.org/>
make[2]: *** [aclocal.m4] Error 127
make[2]: Leaving directory `/tmp/dino/ONIGURUMA'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/dino'
make: *** [all] Error 2

i fixed it by patching missing to exit 0.

PS build time might be highly reduced by removing use of libtool and obsolete configure checks, like checking for standard headers like string.h... also subdirs seem to run the same checks again.

PPS is it possible to disable C++ usage to speed up build/require less dependencies (for example mirbsd ships without C++ toolchain)?

hi, first of all thanks for this exciting project, it is a real treasure trove.

Thank you for your interest in the project. I used this language and code mostly for my research. The language design is almost 30 years old and most of the code is 20 years old. I would do the language design and implementation differently now. I am still thinking about language redesign and language reimplementation. But it will only happen if I have free time but I don't see such opportunity in the near future.

unfortunately build fails:

fixed it by patching missing to exit 0.

Oniguruma is not my package. So it is hard for me to say what is wrong with it w/o investigation. Probably there is a new version of ONIGURUMA and it is better to update it too. May be it also solves the problem.

Also scripts (e.g. configure) generated by autoconf tools probably should be not in the source distribution as they require particular versions of utilities. Instead building should include autoconf to build configure script.

In any case if you have acceptable solution, your pull request would be welcomed.

PS build time might be highly reduced by removing use of libtool and obsolete configure checks, like checking for standard headers like string.h... also subdirs seem to run the same checks again.

PPS is it possible to disable C++ usage to speed up build/require less dependencies (for example mirbsd ships without C++ toolchain)?

I guess so. The implementation itself contains a few compiler tools which were originally written on C. C++ code was added a bit lately when people were excited about C++ and everybody moved from C to C++. If people need use the tools in C++ they could use C code. One thing I would like in reimplementation would be removing tools (it is a lot of code) and as a consequence C++ interfaces. It makes sources unneccassary too big.

i researched the issue a bit and it seems due to timestamp confusion, as git doesn't record timestamp of files, so make causes to run missing aclocal-${am__api_version} as it encounters generated files that are supposedly older than their sources.

the fundamental issue here is that automake tries to be helpful to the developer and re-run automake related tools for him automatically; but unfortunately it does it from the wrong place, in generated build infrastructure intended for the user, not the developer.

the developer would run autogen.sh or equivalent manually anyway after changes to m4 files or automake input files, so this is completely pointless. automake's generated Makefile.in should simply ignore automake infrastructure like m4 input files completely.

it is a quite common problem, and a good summary is here: https://stackoverflow.com/questions/33278928/how-to-overcome-aclocal-1-15-is-missing-on-your-system-warning

i'm pretty sure this problem specifically frustrated many developers considerably, causing them to switch to the horrible CMake build system, or lately to meson which unfortunately sees a high adoption even though it requires bleeding edge python features. i personally had my fair share of trouble with missing script during development of sabotage linux.

my recommendation would be to get this issue fixed at its root in automake by fellow redhat developers, so it won't plague future generations (and enforce opinion that autoconf is bad) but until that happens it can be easily fixed by either exporting MISSING=true or putting exit 0 as first line after #!/bin/sh in onigurama/missing script. the latter is what i would propose here, but maybe after updating onigurama to 6.9.4 release as soon as it is available (it will contain some security fixes).

Probably there is a new version of ONIGURUMA and it is better to update it too. May be it also solves the problem.

updating it couldn't hurt, but it won't fix the issue with missing.

Also scripts (e.g. configure) generated by autoconf tools probably should be not in the source distribution as they require particular versions of utilities. Instead building should include autoconf to build configure script.

yes, that would work. it is the cleanest solution for a git repo (especially when submodule uses aclocal). but it makes barrier to entry higher so imo a change like that should be accompanied by a release tarball with pre-generated configure scripts.

One thing I would like in reimplementation would be removing tools (it is a lot of code) and as a consequence C++ interfaces. It makes sources unneccassary too big.

would you make a separate repo for the tools then, just like you did with yaep? they definitely should stay available in some sort, as they're very cool. i already look forward to trying out MSTR, it could even be used to speed up existing yacc parsers of other projects.

my personal preference would be to keep them in the repo, but building them from toplevel makefile instead of recursive, so only toplevel configure needs to be run, and dependency resolution is 100% accurate (so e.g. no need for make distclean after changing CFLAGS). otoh the COCOM tools could maybe get better (and deserved) exposure if they're treated as standalone project(s).

The language design is almost 30 years old and most of the code is 20 years old. I would do the language design and implementation differently now.

i'm curious, what would you change and why ? i was reading some dino code the last days and am fascinated by its elegance.

by the way, here are some random ideas that came to my mind while playing/reading with dino/cocom (apart from using dino itself).

usage as a language builder toolkit like pypy (but without requiring 1+ hour and dozens of dependencies to build the thing). for example to make alternative impl of a simple python 2.7 (as 3.x is still moving target) interpreter with parser stolen from micropython/pypy/python itself (or maybe by turning the official grammar to yacc), that reuses dinos bytecode, IR, etc. (iirc mruby uses a bison parser to, so it could be quickly plugged on top of dino framework too)
transpiling bytecode/IR/AST to C completely for best performance and portability
a small interpreter engine where one can link dino bytecode (and maybe external .o files emitted by jit) into a statically linked binary (e.g. using musl libc for size) for small portable (as in "compile on one PC, use on many") programs.

One thing I would like in reimplementation would be removing tools (it is a lot of code) and as a consequence C++ interfaces. It makes sources unneccassary too big.

would you make a separate repo for the tools then, just like you did with yaep? they definitely should stay available in some sort, as they're very cool. i already look forward to trying out MSTR, it could even be used to speed up existing yacc parsers of other projects.

my personal preference would be to keep them in the repo, but building them from toplevel makefile instead of recursive, so only toplevel configure needs to be run, and dependency resolution is 100% accurate (so e.g. no need for make distclean after changing CFLAGS). otoh the COCOM tools could maybe get better (and deserved) exposure if they're treated as standalone project(s).

I thought about myself. YAEP was a step into this direction. I did this because other Early parser implementations were so bad but people talked about them as good ones.

MSTA and SHILKA may be useful. I think SPRUT is outdated, OKA and NONA are too specialized.

The language design is almost 30 years old and most of the code is 20 years old. I would do the language design and implementation differently now.

i'm curious, what would you change and why ? i was reading some dino code the last days and am fascinated by its elegance.

Here are some random things about language and implementation in no particular orderI can think right now:

Simplification of types. Just one integer type (internally if the value is big it is switched to multiple-precision which is called long now).
Gradual typing (optional type descriptions) for better checking and performance
Removing array slices, they are to complicated for implementation and maintenance
Improving syntax (e.g. tab[] should be {..}). Now it is constrained by LALR parser. Speaking of which. I would like to get rid off all tools in dino implementation. Also I like go compound stmts syntax more than C. It is more straight-forward.
More efficient support of unmutable types (arrays and hashes)
A better model for green thread support (the current was a hack). Parallelism support
Eval
JIT. The current JIT is very experimental. It uses GCC. This work btw affected the current CRuby JIT implementation. I'd like to use MIR project finally. Implementing speculation and deoptimzation in JIT.

by the way, here are some random ideas that came to my mind while playing/reading with dino/cocom (apart from using dino itself).

* usage as a language builder toolkit like pypy (but without requiring 1+ hour and dozens of dependencies to build the thing). for example to make alternative impl of a simple python 2.7 (as 3.x is still moving target) interpreter with parser stolen from micropython/pypy/python itself (or maybe by turning the official grammar to yacc), that reuses dinos bytecode, IR, etc. (iirc mruby uses a bison parser to, so it could be quickly plugged on top of dino framework too)

* transpiling bytecode/IR/AST to C completely for best performance and portability

There is alread JIT based on C code generation. So it would be a doable project

* a small interpreter engine where one can link dino bytecode (and maybe external .o files emitted by jit) into a statically linked binary (e.g. using musl libc for size) for small portable (as in "compile on one PC, use on many") programs.

Your ideas are interesting. There are too many directions. May be I'll take one. But to be honest I don't know when I have time for this.

Here are some random things about language and implementation in no particular orderI can think right now:

thanks for sharing your thoughts.

Simplification of types. Just one integer type (internally if the value is big it is switched to multiple-precision which is called long now).

Gradual typing (optional type descriptions) for better checking and performance

i agree.

Removing array slices, they are to complicated for implementation and maintenance

right. while studying the manual i had the impression that especially the unary/binary operations on slices are very special-purpose and could easily be implemented as a function by the programmer directly when needed, without making syntax more complicated. otoh having the basic slice declaration like in python (also on strings) is useful as syntactic sugar for everyday tasks. btw, while reading the manual i couldn't quite figure out what precisely is meant with the "implicit string conversion" of types. i will play more with REPL to find out how it works.

A better model for green thread support (the current was a hack). Parallelism support

oh? i was quite impressed by how they work and especially performance numbers you demonstrated. but indeed, having real parallel execution is a "killer feature", as most dynamic langs do it very wrong. this is one of the main reason i was lately looking into doing my own language.

JIT. The current JIT is very experimental. It uses GCC. This work btw affected the current CRuby JIT implementation. I'd like to use MIR project finally. Implementing speculation and deoptimzation in JIT.

i found your approach of using the C compiler for JIT quite enlightening. apart from compiling everything directly to C, it is certainly the best approach, as LLVM is a huge beast and every version is incompatible with the next one (if you're using 5 langs based on llvm, you basically need 5 different llvm versions installed). i've been reading your blog and also came across the issue on ruby tracker where you implemented faster hash for the ruby community. quite unfortunate that it turned into an unpleasant "competition", but at the end, common sense prevailed.

while studying dino manual, the main thing i thought could be simplified (apart from slice operators) is the special OOP stuff like friend, final, pub, priv. it is a little confusing. maybe best option would be to have either everything public like python, or only a "priv" keyword for things one wants to hide.

right. while studying the manual i had the impression that especially the unary/binary operations on slices are very special-purpose and could easily be implemented as a function by the programmer directly when needed, without making syntax more complicated. otoh having the basic slice declaration like in python (also on strings) is useful as syntactic sugar for everyday tasks. btw, while reading the manual i couldn't quite figure out what precisely is meant with the "implicit string conversion" of types. i will play more with REPL to find out how it works.

Implicit string conversion types are types whose data can be implicitly converted into string when a string is expected (e.g. in string concatenation operator @). These types are numbers (int, long, fp) and characters.

A better model for green thread support (the current was a hack). Parallelism support

oh? i was quite impressed by how they work and especially performance numbers you demonstrated. but indeed, having real parallel execution is a "killer feature", as most dynamic langs do it very wrong. this is one of the main reason i was lately looking into doing my own language.

The thread sync in DINO is primitive. Although may be it is possible to build something convenient on it. The current thread implementation prohibit effective JIT code generation. Basically right now JIT generated code ignores that some thread is waiting and it creates different semantics for JIT and interpreted code which is really bad. But when a JITted code execution finishes, interpreter can switch to another waiting thread. So these are the current problems with threads.

For modern CPUs I think parallelism is a must even if for many cases green threads can be a faster solution (e.g. for GUI or io-bound parallelism).

JIT. The current JIT is very experimental. It uses GCC. This work btw affected the current CRuby JIT implementation. I'd like to use MIR project finally. Implementing speculation and deoptimzation in JIT.

i found your approach of using the C compiler for JIT quite enlightening. apart from compiling everything directly to C, it is certainly the best approach, as LLVM is a huge beast and every version is incompatible with the next one (if you're using 5 langs based on llvm, you basically need 5 different llvm versions installed). i've been reading your blog and also came across the issue on ruby tracker where you implemented faster hash for the ruby community. quite unfortunate that it turned into an unpleasant "competition", but at the end, common sense prevailed.

Using C for JIT code generation provides a stable interface. Although some people are uncomfortable to use gcc/clang in their production environment. MIR project now has own C compiler which can be linked as a library. It is compilation speed is 10 times faster then gcc -O2 so MIR can be used in analogous way but using MIR directly would be even faster.

Also the current Dino JIT is not parallel as CRuby one. When something is JIT compiled, the execution stops. It should be corrected too.

As for the hash-table experience, it is ok. It is hard to compete when you are novice to the community. Some well established open source projects need a decade of work just to be taken seriously.

while studying dino manual, the main thing i thought could be simplified (apart from slice operators) is the special OOP stuff like friend, final, pub, priv. it is a little confusing. maybe best option would be to have either everything public like python, or only a "priv" keyword for things one wants to hide.

I am agree. I don't like it too. It would be another way to simplify the language. Also I didn't decided yet what I want for object-orientation: some static description as now or some dynamic as Ruby. Dynamic approach would permit simplify the language more. Object could be just a hash table with syntactic sugar for object orientation to look in usual way.

dino-lang / dino

build error (again: ONIGURUMA/missing) #13