SolidCode / SolidPython

A python frontend for solid modelling that compiles to OpenSCAD
1.1k stars 171 forks source link

Use py scadparser #170

Closed jeff-dh closed 3 years ago

jeff-dh commented 3 years ago

This branch uses a py_scadparser a python scad parser based on ply to parse open scad code.

jeff-dh commented 3 years ago

2 out of the 3 new commits should fix the issues you mentioned in #166.

The third commit made little changes to run_all_test.sh to make it work on my system and I assume it should not break it for anyone else (working on linux!) and hopefully makes it work on more systems. if you don't like it we can revert it and I keep it as a local copy.

etjones commented 3 years ago

This looks good. Couple questions:

etjones commented 3 years ago

One other thing: Do you know what the deal is with this parser output?:

In [2]: MCAD = import_scad('MCAD')
Illegal character (31) """
31:1132 < - <
syntex error
Illegal character (31) "'"
Illegal character (31) "'"
Illegal character (32) "'"
Illegal character (32) "'"
Illegal character (33) "'"
Illegal character (33) "'"
Illegal character (33) """
36:1305 } - }
syntex error

... and similar....

The MCAD import all works, but I'm not sure what the parser is doing with these bits. Do you think it's a problem with the grammar?

jeff-dh commented 3 years ago

One other thing: Do you know what the deal is with this parser output?:

In [2]: MCAD = import_scad('MCAD')
Illegal character (31) """
31:1132 < - <
syntex error
Illegal character (31) "'"
Illegal character (31) "'"
Illegal character (32) "'"
Illegal character (32) "'"
Illegal character (33) "'"
Illegal character (33) "'"
Illegal character (33) """
36:1305 } - }
syntex error

... and similar....

The MCAD import all works, but I'm not sure what the parser is doing with these bits. Do you think it's a problem with the grammar?

hmmmmm this should not happen..... I'll check and think about it.....

How do you execute your code? (what is In[2]: ?) Does it print to stdout when you call a solidpython script?

I've no clue what this is about right now.

I'll try to improve the error "logging" to include the filename and we should try to track it down.

When does it occur? Only with MCAD?

jeff-dh commented 3 years ago

While trying to redo what you've done I by accident stumbled over this:

>>> mcad = solid.import_scad("MCAD")
>>> dir(mcad)
['2Dshapes', '3d_triangle', '__class__', '__delattr__', [...] 'units', 'unregular_shapes', 'utilities']
>>> mcad.2Dshapes
  File "<stdin>", line 1
    mcad.2Dshapes
         ^
SyntaxError: invalid syntax

I think this is another bug, I assume the filenames (-> module names) don't get checked whether they are valid python modifiers.....

jeff-dh commented 3 years ago

I can't reproduce your issue.... I can import MCAD without any of these messages and they should not occur! It means that the lexer does not now certain characters and the parser gets confused because the whole gramar does not make any sense with missing tokens.

jeff-dh commented 3 years ago
* we'll need to do `poetry add ply`, which adds Ply 3.11 as a dependency. I'll see if I can fix that.

I'm absolutely not into packaging, I've no clue what you're talking about ;)

* Running the tests generates `solid/py_scadparser/parser.out` and `solid/py_scadparser/parsetab.py`.
   What do these do? Should we add them to the .gitignore? Should we just check them in?

ply (bison) are somehow table driven. They "create" a certain table from the grammar and use that table to parse stuff.parsetab.py gets created whenever the grammar changes (or if it does not exist, the parser prints out LALR Table Generated or something like that).

My first thought was to .gitignore it and let every installation generate it's own parsetab.py on the first execution, BUT this does not worked if installed in an read only location (e.g. system-wide). So we should put it under version control and ship it with the package.

from ply.docs

To assist in debugging, yacc.py creates a debugging file called 'parser.out' when it generates the parsing table.

I guess we can .gitignore it

* What do we do with functions that are valid OpenSCAD identifiers but not valid Python, like `module 8pointed_star()`?
  I did some ugly gymnastics to add prefixes (`8pointed_star()` => `_8pointed_star()` ) in the past, but I wasn't ever
  happy with my solution there.
* I love the [fix](https://github.com/jeff-dh/SolidPython/tree/openscad_identifiers_starting_with_digits) you put in for 
  OpenSCAD functions starting with digits, too. Let's add that later too!

I don't get this.... to me it seems to be the same and should be fixed by the branch you mentioned.

Btw: I thought about adding a isidentifier call to [un]subbed_keyword to prevent code-injections, what do you think about that?

etjones commented 3 years ago

I think we should both add the parser output to source control, AND add it to .gitignore; otherwise the time stamp change on first run will appear in git status after everybody’s first run.

I wonder if the MCAD imports are some kind of case-sensitivity issue? I’m running ion MacOS which has less strict (and less logical) case treatment than Linux. I’m away from a computer again for a couple days, but will poke at that some more when I’m back

The issue with the missing initial digit handling is a GitHub wrinkle; I pushed my commits to a branch in the SolidCode/SolidPython repo rather than your PR branch. Looks like it’s working OK. I’ll grab that and the filenames bug you found. We’ll get there!

Thanks again for your work on this and your good instincts

jeff-dh commented 3 years ago

I think we should both add the parser output to source control, AND add it to .gitignore; otherwise the time stamp change on first run will appear in git status after everybody’s first run.

I don't think so and I don't think it's a good idea ;) ply seems to make somekind of hash check on the grammar not on the file. Even if you rewrite the actions of the parser in the same file as the grammar it does not generate a new parsetab.py. The file timestamp seem to not matter.

Furthermore I would be afraid this could become a commit fall trap. If you change the grammar (bugfixes) the resulting generated tables will not show up in git and you're very likely to forget to commit them....

I wonder if the MCAD imports are some kind of case-sensitivity issue? I’m running ion MacOS which has less strict (and less logical) case treatment than Linux. I’m away from a computer again for a couple days, but will poke at that some more when I’m back

If that's the output there's something not working completely off and if wonder how it can "still work". Does MacOS use different line endings or tabs or something like that? I wonder if our MCAD/...* file(s) that cause the problem are the same.

jeff-dh commented 3 years ago

Could you try to reproduce the error with the two patches applied and if they still occur send me the error log and at least the first file mentioned in it

jeff-dh commented 3 years ago

I just tried to use BOSL2 and it throws syntax errors. I started extending the grammar (bosl2 uses some language features the parser (and me) hasn't seen yet).

Therefor I think you can wait until I'll extended the grammar (everything that bosl2 comes up with) and we'll give your issue another try afterwards because the bosl2 changes might include a fix for your issue.

jeff-dh commented 3 years ago

Give it a try.......

It can now import bosl2 and is based on the parser.y from openscad itself (could have done that earlier, but.....that would have been to easy....)

I merged openscad_identifiers_starting_with_digits into this branch so #171 would be obsolete if you merge this branch. (it was just annoying to have already fixed bugs showing up while debugging and merging this....)

ca012e5fd9b4a8e3f235002375c6b7395363e4af also contains code that [un]subbed_keyword should now be able to handle all the $fn <-> segments mapping stuff. I think we can delete all the other bits of code that (now >seems<) to handle it. This was necessary, because py_scadparser now handles $fn parameters correctly and thus solidpython must handle it in a general manner (e.g. bosl2 contains custom $parameters).

Offtopic Btw: I had a look at bosl2 yesterday night and it seems pretty impressive what it is doing. It for example contains a diff feature that is similar (but I think more general) to your part/hole feature. I was wondering whether it would make sense to maybe add bosl2 as kind of a std::lib to solidpython. It also contains stuff for beziers and screw[thread]s I don't know about extrude_along_path but as comprehensive as it seems it might......

https://github.com/revarbat/BOSL2/wiki

part-hole stuff: https://github.com/revarbat/BOSL2/wiki/attachments.scad#module-diff

etjones commented 3 years ago

OK, right on. I just pulled the top of your branch tree, and I'm liking where things are at.

I've merged this to master, so we may be able close this issue and a couple others. There are a couple details I still want to look at, but let's put them into new issues as needed.

etjones commented 3 years ago

Hmm.. Just looked at PR #173, which looks like it's a better solution for python-illegal filenames than the little hack I put into master. I may look to merge that in instead of my fix.

jeff-dh commented 3 years ago
* I think it would be great to include BOSL2 & MCAD with each SolidPython install. 

After having a look at BOSL2 I wonder if MCAD becomes obsolete....

as well as ensuring we prioritize user-installed versions of those libraries.

-> #176 should solve this. A single resolve_scad_filename function that puts the "solidpython version" behind the user space version.... but I would probably put it before a system wide version, e.g. the order the resolve filenames could be:

I think that would be the way I would do it.

But then I would not import those libs on a regular startup. Don't include them in init or similar, because importing BOSL2 -- at least on my system -- takes 2secs. I would do something like creating a sub package for -- each? -- library.

--> import solid.bosl2 could import bosl as solid.bosl2 this could be a file or directory that does the import as itself gets imported.

I haven't checked this out all the way, but what happens with $tags => __tags? Usually Python does some namespace changes to variables with two leading underscores. I don't even know what $tags does in OpenSCAD, so I'm not sure about the intended result.

Actually I don't know of any special treatment for __vars in python. Put there might be some. I just had to get some escape sequence that's still an identifier and different from the others already in use. If you have an idea for something better that would be welcome.

$tags is not OpenSCAD, it's from BOSL2. You can "tag" -- I guess -- any object in the tree with a certain tag. Afterwards you can do something like

diff(neg="hole", pos="part", keep="bolt_in_hole")
{
    cube([...], $tggs = "part");
    cylinder(...., $tags = "hole");
    cylinder(..., $tags = "bolt_in_hole");
}

I passed yacc.yacc() a couple arguments to prevent it from writing its debug & parse table files out. I haven't done timing comparisons on this to determine if this makes repeated calls to import_scad() significantly slower than if the grammar files were cached. If that becomes an issue, I'm happy to think about other ways to do this, but for the moment this parses correctly and doesn't force us to deal with an intermediate file format, so it seemed like a reasonable path forward.

hmmmm I don't really like it if it's not really necessary. Did you try to add parsetab.py to git (and not to .gitignore) and parse.out to .gitignore? I didn't try it but I guess it'll work.

jeff-dh commented 3 years ago

as well as ensuring we prioritize user-installed versions of those libraries.

-> #176 should solve this. A single resolve_scad_filename function that puts the "solidpython version" behind the user space version.... but I would probably put it before a system wide version, e.g. the order the resolve filenames could be:

* `relative paths`

* `~/.local/share/....`

* `<path_to_solid>/libs/....`

* `/usr/share/....`

I think that would be the way I would do it.

Another solution could be to keep import_scad exactly the way it is right now (searches relative paths, ~/.local/... and I would add /usr/share/openSCAD/libraries) and then add solid.bosl2 as a package that -- somehow -- imports the version installed with solidpython (for example resolving the absolute path of it and than import it with an absolute path).

etjones commented 3 years ago

I just ran some tests on the non-cached parser version. That's a big time sink (about 5x time!) , so I'll definitely be switching to write parsetab.py back out to disk. I haven't figured out how that cached file is checked, though. If the OpenSCAD grammar changes, do we need to delete the parsetab file and update that? Will examine.

etjones commented 3 years ago

re: $tags -> __tags: In Python classes, member variables with double underscores have their names changed automatically to something like _<class_name>__<var_name>. I've just confirmed that none of that magic is happening in the params field where the tags bit happens. So, no worries there.

https://docs.python.org/3/tutorial/classes.html#private-variables

jeff-dh commented 3 years ago

I guess:

If you take a look at parsetab.py line 9 it says: _lr_signature = a string that seems to be or similar.

I assume whenever ply is called it will extract exactly that signature from the source and compare it. Only if it changed it'll regenerate parsetab.py. That would mean it is completely independent of the file timestamps and even of the file contents (as long as the signature didn't change) and it will only generate the tables if it is needed. And if that's the case it's absolutely right that git shows the parsetab.py as changed (because it needs to be update in the repo).

Therefor there should be no issues (at all?).

etjones commented 3 years ago

Agreed. I've added parsetab.py in my local version, will push to master when I've kicked the tires a little bit. I didn't add the debug argument back in (which causes parser.out to be written). Do you see any reason why that file should be there?

jeff-dh commented 3 years ago

https://www.dabeaz.com/ply/ply.html#ply_nn49

Packaging advice from the ply docs especially about parsetab.py (but it unfortunately doesn't go into the details)

jeff-dh commented 3 years ago

from the docs:

By default, the yacc generates tables in debugging mode (which produces the parser.out file and other output). To disable this, use parser = yacc.yacc(debug=False)

No, seems to be pure debugging. Get rid of it ;)

jeff-dh commented 3 years ago

There's one more issue concerning this merge.

Since 8 hours ago py_scadparser/scad_parser.py is based on openscad/parser.y.

What I actually did was to grab that file. Try to make it run with pybison (what didn't work out of the box) but it generated a python version of it and I grabed that and mouldet it into a ply version that's now scad_parser.py. In other words the grammar definition was extracted from parser.y.

I'm not into all this packaging and licensing stuff, but if I understand it correctly the result of that is, that scad_parser.py is technically a (heavy) modified version of openscad/parser.y, right? And if I understand the GPLv2 correctly that means scad_parser.py also has to be under GPLv2.

I changed the LICENSE file in the py_scadparser repo and I think we should do the same in SolidPython/solid/py_scadparser.

Sorry for that.

etjones commented 3 years ago

Oh man, I also try to avoid having to think about licenses at all. They're a real drag. I think however, that if you're not committing parser.y, there shouldn't be any license repercussions. If what you're using is generated code, it's not clear to me that that generated code is covered by license. But... this is worth looking into.

The current SolidPython license is LGPL 2.1, since, as I recall, it's minimally restrictive. I just want to share cool things and not worry about those details (and some GNU troll may show up shortly and write me a novel about how this is exactly why I should care about their troll-y legal stuff and also would I maybe like some of their Kool Aid. Thank you, I would not).

So anyway, I'm emphatically opposed to any kind of viral licensing. My idea of free to share means free to share, change, steal, relabel, whatever. If you look into this some more and you find to the best of your understanding that there's no way to avoid keeping the code you added this morning without a license change, we'll back it out.

jeff-dh commented 3 years ago

https://www.gnu.org/licenses/gpl-faq.en.html#CanIUseGPLToolsForNF https://opensource.stackexchange.com/questions/8058/does-the-gpl-doesnt-cover-the-ouput-of-a-program-also-apply-if-the-output-is https://www.gnu.org/licenses/gpl-faq.html#AllCompatibility

I think =((

PS: WTFPL

etjones commented 3 years ago

You wrote the first version of the OpenSCAD grammar by hand though, right? How difficult would it be to just eyeball the diffs between your original version and the parser.y version and correct any mistakes you made? Seems like you couldn't run into any license issues that way

jeff-dh commented 3 years ago

Aaeeh, yeah I might do something like that, but my motivation to do it.............

etjones commented 3 years ago

I hear you man; not super fun. Thanks for all your work on this; I think you have better instincts about these things than I do. I'm going to roll master back to where it was before until we've resolved license issues.

jeff-dh commented 3 years ago

There's something in the pipeline that looks pretty ok (not as nice as the other one but ok) but the polishing is still missing...

jeff-dh commented 3 years ago

I think it would be great to include BOSL2 & MCAD with each SolidPython install. There's a little bit of packaging-Fu that would require, as well as ensuring we prioritize user-installed versions of those libraries. Still, very nice to have. I'll have a think about the best way to do this.

Take a look at:

0fda2d2a7a15acc4553292564bfa89e6f0ffe00c (extend import_scad to be able to pass a destination namespace as parameter) 9c3bb1bc84c63819edfbfd04ab2b02dd047af388 (added "bosl2 hook" into the python world, it is libs/bosl2/BOSL2_scad

The result is:

import solid.libs.bosl2
solid.libs.bosl2.linear_bearings.linear_bearing_housing()

works just fine

etjones commented 3 years ago

Let's move the BOSL2 issues to a separate PR and work there. I'm trying to figure out how best to incorporate the BOSL2 code into SolidPython's. Should we inline the code? Use a git submodule that could be updated (I think so? Submodules don't autopopulate though, which is often a hassle) to match the current versions? Anyway, let's continue in another thread

etjones commented 3 years ago

Hey @jeff-dh - I wonder if you've got any ideas about very long parse times for SolidPython-generated SCAD files.

I ran across this while running unit tests. If you you run /$PATH_TO/SolidPython/solid/examples/run_all_examples.sh, and then try (in a shell, say):

from solid import import_scad
maze = import_scad('/$PATH_TO/SolidPython/solid/examples/Compiled_examples/mazebox.scad')

It will take a surprising (10+ seconds) amount of time to parse. That's not ridiculous, since one line of that file contains 1.5M characters. That isn't a horrible outcome, and the times when you're recursively importing OpenSCAD-generated .scad files ought to be pretty rare. Still, this is a problem we didn't have with the regex-based "parse" we had before, and, since we don't define modules or functions in SP-generated OpenSCAD, parsing those files will never return anything we want to import. Is there any way to parse only portions (functions, modules, variables) of a file?

etjones commented 3 years ago

Aaand, one more question. The first time I run SP on Python 3.8.3 on my MacOS 10.14 system, I get the following DeprecationWarnings from scad_tokens.py:

test_solidpython.py
............/Users/evan.jones/Dropbox/Projects/SolidPython/solid/py_scadparser/scad_tokens.py:55: DeprecationWarning: invalid escape sequence \&
  t_AND = "\&\&"
/Users/evan.jones/Dropbox/Projects/SolidPython/solid/py_scadparser/scad_tokens.py:56: DeprecationWarning: invalid escape sequence \|
  t_OR = "\|\|"

Looks like those backslashes (t_AND = "\&\&", t_OR = "\|\|"), are or will be invalid escape chars. If I replace them with raw strings (t_AND = r"\&\&", t_OR = r"\|\|") and delete parsetab.py, the warnings don't show up any more and everything seems to work OK. Buuut, I've done too much Bash programming to just mess with escape characters and not check what's happening. So, do you think it's OK to replace those lines with raw r-strings?

jeff-dh commented 3 years ago

aah, r"..." is a raw string? I though it would be a shortcut for regex... I guess that should fine. But I wonder whether those characters need to be escaped at all........

etjones commented 3 years ago

They get used for regexes a bunch so that characters' special meanings can be used for regex purposes, not for Python's normal escape purposes. Like single quotes in Bash as opposed to double quotes? Anyway, it's not super important but seems like one last wrinkle to smooth out. What could go wrong, right?

jeff-dh commented 3 years ago

Not sure whether I understand you correctly in any single point, but I'll give you a few thoughts about the whole topic. (this is all my personal "instinctive opinion, I might change it and might be wrong ;)

I don't think you should include such a scad file ;) If you do something like this it seems to me like a design error (in the openscad world). If I create an OpenSCAD module that provides a public interface it should not contain enormous amounts of data......... I would suggest to extract and separate them.

The fix is pretty simple: Don't do it ;)

BUT:

Yeah I know that the parser is not very fast. I think this is because its a pure python parser. I'm pretty sure regular bison & flex are gonna be probably at least 10 times faster, but well, this is the price we pay for all the other nice features of python. I spent half an hour looking into some other python parsers which are running on a c-backend but couldn't find an easy to do alternative.

Anyway: My first try to write a scad parser was exactly based on the idea you mentioned. Only parse the lines starting with the module or function keyword but I couldn't make it work in bison to skip the whole body (because you do need context to recognize the corresponding closing brace, you can't do it in the lexer). This might be possible somehow but I couldn't figure out how to do it.

I don't have a good solution to improved the parser speed itself, BUT I had the same issue while working on bosl2 (which consists of ~30 scad files) and it took 2secs to import bosl2 in every single test cycle and found a "funny" solution for it: the pickle-cache.

What it actually does it is caches the output of the parser to disk and if it is ask to parse a file it first looks into the pickle-cache-directory whether there's a valid cached version of the parsed-output. If so read and use it. This increases the speed importing bosl2 by a factor of ~15 and thus was a very easy solution and is good enough for me. It basically only parses each (at least library files) once, so the first time you -- for example -- import bosl2 it is "slow" (2secs) but thereafter its fast (100ms).

https://github.com/jeff-dh/SolidPython/blob/exp_solid/solid/core/parse_scad.py

That's the parse_scad module including the pickle-cache used in exp_solid take a look at it (or try to plug it into master, might work)

jeff-dh commented 3 years ago

Looks like those backslashes (t_AND = "\&\&", t_OR = "\|\|"), are or will be invalid escape chars. If I replace them with raw strings (t_AND = r"\&\&", t_OR = r"\|\|") and delete parsetab.py, the warnings don't show up any more and everything seems to work OK.

It seems like that's the solution:

ply docs:

Each token is specified by writing a regular expression rule compatible with Python's re module.

python re docs:

Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a DeprecationWarning and in the future this will become a SyntaxError. This behaviour will happen even if it is a valid escape sequence for a regular expression. The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'.

etjones commented 3 years ago

Right on. I think your "Don't do that" fix is the right one. The only reason I ran into it at all was that I had a unit test that was recursively importing the whole examples directory. I changed that unit test to run in a temporary directory that only contained a couple things, which is a better way to run a unit test anyway.

I'll just let this one go.

The pickle solution you've got going there looks nice and super fast for anybody using a lot of imported SCAD code. Once I've caught up to your other changes, maybe we can fold that into master.