factored out shilka repo

dino-lang / dino

The programming language DINO

GNU General Public License v2.0

71 stars 5 forks source link

factored out shilka repo #17

Open rofl0r opened 4 years ago

rofl0r commented 4 years ago

hello, we talked recently about factoring out COCOM components into own repo. i gave it a try here: https://github.com/rofl0r/shilka

note that i not simply copied stuff from dino repo there, i recreated relevant parts of git history with a script[0].

after that, i had to copy relevant parts of AMMUNITION into place and tweaking automake input files, and removed libtool usage while at it in order to reduce lines of diff for autogen-generated files. but in the end i opted to remove autoconf-generated files completely.

if this is an approach you like, the mentioned repo could maybe be made available on dino-lang account, and similar thing done to other COCOM components? if so, a volunteer should check that the source files are identical to dino repo, so that trust can be established. (simple md5sum run should suffice).

thanks!

[0]:

import sys, os
sys.path.append('../hardcore-utils')
# import custom patch unit, can be made available if desired
import patch

def cmd_arr(cmd):
    p = os.popen(cmd)
    a = [x.rstrip('\n') for x in p.readlines()]
    p.close()
    return a

def trailslash(s):
    if s.endswith('/'): return s
    return s + '/'

def remove_dir_from_patch(p, dir):
    new = []
    for x in p.split('\n'):
        if x.startswith('diff --git a') or x.startswith('--- ') or x.startswith('+++ '):
            x = x.replace(trailslash(dir), '')
        new.append(x)
    return '\n'.join(new)

dir = sys.argv[1]
out = sys.argv[2]

# [::-1] : reverse
commits=cmd_arr("git log %s/ | cat | grep '^commit ' | cut -d ' ' -f 2"%dir)[::-1]
n = 1
for c in commits:
    proc=os.popen("git format-patch -1 --stdout %s"%c)
    p = patch.Patch(proc)
    proc.close()
    good_hunks = []
    for h in p.hunks:
        if not (h.get_nfile(1).startswith(dir) or h.get_ofile(1).startswith(dir)):
            continue
        good_hunks.append(h)
    p.hunks = good_hunks
    patch_text = remove_dir_from_patch(repr(p), dir)
    with open('%s/%d.patch'%(out, n), 'w') as fh:
        fh.write(patch_text)
    n += 1

vnmakarov commented 4 years ago

Thank you for your efforts. I see a real value to make a separation of the tools as it was done for Earley Parser. MSTA, NONA, and SHILKA would be probably most interesting tools for other people.

Let me think one more week about your proposal. Currently, my view is the following:

My wish is to make Dino implementation independent of any of these tools and I will work into this direction. So I think it is not right to put them as dino-lang repositories. If you wish to work on the separation, I could create a new organization to put the tool repositories there and make both of us the members.

Please, let me know what you think about this approach. Meanwhile, I am taking one week pause to also think more about such approach.

rofl0r commented 4 years ago

as it was done for Earley Parser

oh, i should have checked the approach you took there, esp. regarding AMMUNITION.

Let me think one more week about your proposal.

certainly. we're talking about 20 years (or more) worth of masterful engineering, so no premature decision should be made (if at all).

My wish is to make Dino implementation independent of any of these tools

may i ask why? certainly using different approach would require a lot of effort and make the language slower, as the existing tools are top-notch. if build speed is your major concern i could come up with a autoconf-less, only GNU make based (and cross-compile compatible) approach with a single toplevel makefile, which would probably reduce build time to 1/3. if compatibility with antiquated toolchains is desired, a few handpicked configure tests could be added.

I think it is not right to put them as dino-lang repositories. If you wish to work on the separation, I could create a new organization to put the tool repositories there and make both of us the members.

sounds fine to me, even without r/w access.

vnmakarov commented 4 years ago

My wish is to make Dino implementation independent of any of these tools

may i ask why? certainly using different approach would require a lot of effort and make the language slower, as the existing tools are top-notch. if build speed is your major concern i could come up with a autoconf-less, only GNU make based (and cross-compile compatible) approach with a single toplevel makefile, which would probably reduce build time to 1/3. if compatibility with antiquated toolchains is desired, a few handpicked configure tests could be added.

It is not about the build time. First, MSTA. Syntax parsing is not critical to the front-end speed (of course if people don't do stupid things in a parser implementation). It is a lexer which works with characters and there are much more characters than tokens. Msta even if it supports LR(k) languages for any fixed k is a big constrain to design language as I want.

For example, CRuby with their YACC approach is always struggling with this when the developers are trying to add new features. Also current Dino syntax described by LALR(1) grammar and using generic LR(k) parser-generator is overkill. But still MSTA could be helpful for other projects (especially I saw a big interest in it from education for learning formal grammars).

Sprut, I use a very small part of its features for Dino implementation. It can be easily changed to C struct and access macros/inlined functions.

Shilka is also can be easily changed by simple C code with usage flexible manual parser which can be easily changed for new Dino features.

A lot of AMMUNITION can be got rid of too. Now my approach is to use type safe ADT which are easy for debugging. You can find them on https://github.com/vnmakarov/mir (mir-varr.h, mir-htab.h, mir-dlist.h, mir-hash.h, etc).

So after ridding these tools, I'll have pure C implementation.

I think it is not right to put them as dino-lang repositories. If you wish to work on the separation, I could create a new organization to put the tool repositories there and make both of us the members.

sounds fine to me, even without r/w access.

I think if you are going to do this, it would be more convenient to you to have r/w access at least for the work period.

vnmakarov commented 4 years ago

I've just created cocom-org (unfortunately cocom name was already taken) and sent you an invite. I'll give you rw access rights. I never did this before so it will be a learning process for me. I guess you can create repositories for each tool.

rofl0r commented 4 years ago

I've just created cocom-org (unfortunately cocom name was already taken) and sent you an invite. I'll give you rw access rights.

thanks, it is a great honor. so do you think the approach of copying required AMMUNITION files into each repo is OK ? there are also git sub-repos, but i found dealing with them is a real mess.

vnmakarov commented 4 years ago

thanks, it is a great honor. so do you think the approach of copying required AMMUNITION files into each repo is OK ? there are also git sub-repos, but i found dealing with them is a real mess.

Thank you for your help and interest in the tools. I think copying AMMUNITION files (only necessary ones) are OK and probably the best way to deal with this problem.

The bigger problem will be SPRUT. It is used for building many tools. Including it in every tool is overkill. I think we should use manually C structures instead of it at the end of process separation. That is what I am going to do with Dino.

Actually you woke up an interest in me for Dino project again. So I am thinking about changing language a bit, implementing parallelism support (like Elixir), adding MIR jit to it, implementing some generation speculation techniques. I might find time for this project in half year. Meanwhile, I will work on freeing Dino implementation from the tools. It will be a slow process.

rofl0r commented 4 years ago

ok, i am done factoring out the 5 standalone tool components from dino. i also planned to add AMMUNITION, simply for the purpose of having it with full git history, docs, etc. however there were a couple merges that make the git history non-linear, so it fails on patch 67/93. i think this can be solved by creating a clone of dino repo, then manually flattening the history and using that as the base for a new patch creation process. however, it's going to be quite involving i fear.

The bigger problem will be SPRUT. It is used for building many tools. Including it in every tool is overkill.

i used the same approach as in shilka: adding the sprut-generated files to the repo with a note they should be touched after git checkout. maybe after you checked the repos, a dist tarball should be created for each of them, so the pityful timestamp problem at least doesn't happen there. or directives to run sprut removed from makefile.

I think we should use manually C structures instead of it at the end of process separation.

this could be interesting.

Actually you woke up an interest in me for Dino project again.

glad to hear! this project is too exciting to see it catching dust.

So I am thinking about changing language a bit, implementing parallelism support (like Elixir)

great, a powerful, elegant and fast scripting language WITH parallelism is almost unheard of.

adding MIR jit to it

it would be great to have MIR support indeed, this would open the door to use m2c to generate portable C code.

vnmakarov commented 4 years ago

ok, i am done factoring out the 5 standalone tool components from dino.

Wow! I did not expected that this work will be done so quick. Thank you very much. I really appreciate your help. I'll play with all of this repositories on the next weekend.

i also planned to add AMMUNITION, simply for the purpose of having it with full git history, docs, etc. however there were a couple merges that make the git history non-linear, so it fails on patch 67/93. i think this can be solved by creating a clone of dino repo, then manually flattening the history and using that as the base for a new patch creation process. however, it's going to be quite involving i fear.

OK. Unfortunately, I am a novice to git to help you or give you an advice. Last twenty years, I mostly used svn.

The bigger problem will be SPRUT. It is used for building many tools. Including it in every tool is overkill.

i used the same approach as in shilka: adding the sprut-generated files to the repo with a note they should be touched after git checkout. maybe after you checked the repos, a dist tarball should be created for each of them, so the pityful timestamp problem at least doesn't happen there. or directives to run sprut removed from makefile.

Ok. I'll think about this.

I think we should use manually C structures instead of it at the end of process separation.

this could be interesting.

Actually you woke up an interest in me for Dino project again.

glad to hear! this project is too exciting to see it catching dust.

Your repository separation work will help me with this project going on. My major goal right now is to rewrite parser and don't use MSTA/YACC (yacc actually can not be used instead of MSTA because of mediocre error recovery). It will open me a way to change Dino language flexibly w/o fear that it can not be parsed by MSTA/YACC. I hope to do this work during coming holidays.

So I am thinking about changing language a bit, implementing parallelism support (like Elixir)

great, a powerful, elegant and fast scripting language WITH parallelism is almost unheard of.

adding MIR jit to it

it would be great to have MIR support indeed, this would open the door to use m2c to generate portable C code.

It is a long-term project goal. To be honest I will have not time even for MIR project until April-May (until GCC10 is released).