Problems importing modules from absolute paths

Technologicat / mcpyrate

Advanced macro expander and language lab for Python.

Other

58 stars 3 forks source link

Problems importing modules from absolute paths #1

Closed set-soft closed 3 years ago

set-soft commented 3 years ago

Hi!

I'm trying mcpyrate as you suggested in here. I'm putting this here to avoid polluting coveragepy issues.

I use the macros to simplify the code for my application plug-ins. So I need to expand the macros in files loaded from an specific absolute path. With mcpy I'm using the following import routine:

def _import(name, path):
    # Python 3.4+ import mechanism
    spec = spec_from_file_location("kibot."+name, path)
    mod = module_from_spec(spec)
    try:
        spec.loader.exec_module(mod)
    except ImportError:
        trace_dump()
        logger.error('Unable to import plug-ins')
        exit(WRONG_INSTALL)

Where kibot is the name of my app (and modules root), name is the name of the plug-in and path is the absolute path to the file I want to load as kibot.name.

Using mcpyrate I get the following error:

Traceback (most recent call last):
  File "src/kibot", line 16, in <module>
    main()
  File "/.../kibot/kibot/__main__.py", line 283, in main
    load_actions()
  File "/.../kibot/kibot/kiplot.py", line 84, in load_actions
    _load_actions(os.path.abspath(os.path.dirname(__file__)), True)
  File "/.../kibot/kibot/kiplot.py", line 77, in _load_actions
    _import(name, p)
  File "/.../kibot/kibot/kiplot.py", line 61, in _import
    spec.loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 779, in exec_module
  File "<frozen importlib._bootstrap_external>", line 916, in get_code
  File "/.../kibot/kibot/mcpyrate/importer.py", line 28, in source_to_xcode
    module_macro_bindings = find_macros(tree, filename=path)
  File "/.../kibot/kibot/mcpyrate/expander.py", line 508, in find_macros
    module_absname, more_bindings = get_macros(statement, filename=filename, reload=reload)
  File "/.../kibot/kibot/mcpyrate/coreutils.py", line 115, in get_macros
    module = importlib.import_module(module_absname)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 779, in exec_module
  File "<frozen importlib._bootstrap_external>", line 902, in get_code
KeyError: 'size'

It looks like get_macros doesn't support an absolute path in filename. I need to use absolute path imports, the plug-ins can be from many places in the filesystem, and the user could be trying to overwrite the behavior of an internal plug-in. I remmember trying to mess with sys.path without luck.

I'll try to see get_macros to find a solution.

Technologicat commented 3 years ago

Hi!

Thanks for trying, so I'm not the only one testing this ;)

It should support absolute paths. Even if I start python (or the macropython wrapper) with a relative path, the path has become absolute by the time it reaches mcpyrate.importer.source_to_xcode.

I'll have to try with an adapted version of your _import function, to see if it's something in manually importing the module with the low-level functions from importlib that's causing the issue. That's not too different from how the macropython wrapper does it, though, so at first glance, I'm puzzled.

From the exception, it sounds the likely immediate culprit is mcpyrate.importer.path_xstats. When computing the mtime for a source file, I'm intentionally leaving out the size. The latest docs said size is an optional field, so I thought that shouldn't break anything on 3.6 through 3.8.

If this is the problem, I could sum the sizes of the files in the considered set, and return that as the size? I'm not sure if that makes sense, though, since size changes to different files might balance out - which could be a disaster if something is using the size for cache invalidation. (This is why there currently is no size.)

To test whether this is the problem, what happens if you replace mcpyrate/activate.py with this?

from importlib.machinery import SourceFileLoader

from .importer import source_to_xcode

def nop(*args, **kwargs):
    pass

SourceFileLoader.source_to_code = source_to_xcode
SourceFileLoader.set_data = nop

(Adapted from mcpy; setting SourceFileLoader.set_data to a no-op will disable the writing of .pyc caches. Not customizing SourceFileLoader.path_stats, in turn, disables the customized .pyc cache invalidation.)

set-soft commented 3 years ago

Works!

It enabled me running the tests. I disabled all the code in the macros used to assign line numbers.

The decorator macros shown correct coverage. But I 'm having problems with my block macro: last line doesn't show as covered.

Here is the branch of my code where I'm testing mcpyrate: https://github.com/INTI-CMNB/KiBot/tree/try_mcpyrate

And here is a line reported as not covered: https://coveralls.io/builds/34238330/source?filename=kibot/out_bom.py#L45

This docstring is replaced by an Assign and is clearly covered.

Technologicat commented 3 years ago

Ok, thanks for testing. Then I need to do something about the .pyc cache invalidation. It seems size is mandatory even though the docs state otherwise. OTOH, we're monkey-patching SourceFileLoader (like mcpy does) instead of defining our own loader from scratch like macropy does, so maybe if one replaces too many methods that way, internal implementation assumptions start getting in the way. :P

Also, I just realized that the modules you're loading might not be under any directory in sys.path - hence spec_from_file_location. Cool. I never considered that case.

What are the exact semantics? How do modules loaded that way import other modules from their location? Do they have to use _import, too?

Then, back to the coverage reporting issue - could you try:

from mcpyrate.debug import macros, step_expansion

with step_expansion:
   ... # your code invoking the block macro here

It should print out line numbers and unparsed source code (to sys.stderr) at each step of the macro expansion. Does anything look suspicious there?

And what's on the last line that's being reported as not covered? Can we be sure the Python compiler isn't optimizing it away?

(In my tests, I've noticed Python 3.6 optimizes away pass statements, and any Expr with just a Constant inside it. Hence, the dummy nodes I'm adding for coverage currently use an Assign, so that the compiler can't assume it's a no-op.)

Technologicat commented 3 years ago

Ah, I saw your edit. Thanks. I'll take a look.

(Still curious about those imports, though.)

Technologicat commented 3 years ago

Ok, I grabbed a copy, quickly patched kibot/out_bom.py to use a with step_expansion, and ran macropython -m kibot.out_bom to expand the macros. Result:

**Tree 0x7fb7160bda60 before macro expansion:
  L   42 with document:
  L   43     self.field = ''
  L   44     $Expr: ' Name of the field to use for this column '
  L   45     self.name = ''
  L   46     $Expr: ' Name to display in the header. The field is used when empty '
  L   47     self.join = Optionable
  L   48     $Expr: " [list(string)|string=''] List of fields to join to this column "
**Tree 0x7fb7160bda60 after step 1:
  L ---- $ASTMarker<Done>: 
  L   42     _mcpyrate_coverage = 'source line 42 invoked macro document'
  L   43 self.field = ''
  L   42 self._help_field = "[string=''] Name of the field to use for this column"
  L   45 self.name = ''
  L   42 self._help_name = "[string=''] Name to display in the header. The field is used when empty"
  L   47 self.join = Optionable
  L   42 self._help_join = " [list(string)|string=''] List of fields to join to this column"
**Tree 0x7fb7160bda60 macro expansion complete after 1 step.

and indeed, using with step_expansion["dump"] confirms that this:

              Expr(value=Str(s=" [list(string)|string=''] List of fields to join to this column ",
                             lineno=48,
                             col_offset=16),
                   lineno=48,
                   col_offset=16)],

becomes this:

   Assign(targets=[Attribute(value=Name(id='self',
                                        ctx=Load(),
                                        lineno=42,
                                        col_offset=12),
                             attr='_help_join',
                             ctx=Store(),
                             lineno=42,
                             col_offset=12)],
          value=Str(s=" [list(string)|string=''] List of fields to join to this column",
                    lineno=42,
                    col_offset=12),
          lineno=42,
          col_offset=12)

so yes, seems there's a bug in the code that fills in the line numbers.

The Assign should get 42, because it's added by the macro, but it shouldn't be changing the line number of the Str that already exists in the original unexpanded code. Arrr!

set-soft commented 3 years ago

The plug-ins are imported with _import. They can include any functionality from the main project using .module or kibot.module. The relative imports works fine. I think this is because Python knows where kibot is and that the default is kibot. So other modules are included in a natural way. The code in the plug-ins is never called in a direct way. The plug-ins are registered (by the decorator macro) and then the main code can look for the correct plug-in in the registered list and get the class to instantiate a proper object.

set-soft commented 3 years ago

with step_expansion is really nice, I tried it.

Technologicat commented 3 years ago

Thanks. Let me know if there's something that could be improved in it ;)

Technologicat commented 3 years ago

Thanks for the plug-in explanation.

To my understanding, relative import works because the call to spec_from_file_location in your _import function sets name as kibot.something. This will make the something plugin think it's in the kibot package. Cool trick.

(The documentation for that function is... somewhat spartan. Emacs's anaconda-mode says the implementation of spec_from_file_location comes from a compiled module, so no Alt+. for me - to be completely certain, I'd have to dig in the CPython sources on GitHub.)

Absolute import works, if a kibot exists at the top level under one of the directories in sys.path. This will be the case if you python -m a module from the shell at the project's top level (without installing), because when python -m ..., Python will implicitly add the current working directory to sys.path. This has even changed in 3.7. Previously, Python added the magic "" entry meaning whatever is the current cwd at the moment, but if I'm reading the issue report right, 3.7 fixed that misfeature, so 3.7 and later instead use the fully resolved cwd from interpreter startup time.

An absolute import lookup will succeed also if the kibot package has been installed, because the site-packages directory where installations go (for whichever Python is running the code) will be on sys.path.

And now I see the issue. Looking at kibot.macros.document, it's replacing the original Str node with a new one. This is causing mcpyrate to detect the new node as macro-generated, so that node - correctly, I'd say - gets assigned the line number of the macro invocation.

What I don't get is why the other doc lines are being reported as covered, because they're getting exactly the same treatment, and with step_expansion["dump"] indeed confirms this. There should be no node with that line number in the output, so it should be reported as not covered. Why it isn't... currently I have no idea. Maybe I should with step_expansion the whole file, to see if something else is acting up.

(Note that the default, unparse output mode of step_expansion only shows line number for statements, because in Python those typically begin a new line, whereas expressions don't. To see the line numbers for individual expressions, one has to use the dump mode, which is very verbose.)

So, turns out there's one more corner case I didn't think of - a macro that so completely rewrites its input so that from a particular line, no original AST nodes remain in the output.

I'm tempted to make it the macro's responsibility to fill in the source location correctly in such cases - it's the pythonic way, plus I think there can be no generally applicable rule. As the macro expander sees it, there's simply a missing source location in a node in the transformed tree. So the only thing it can do simply and reliably is to assume that the node was macro-generated (as it indeed was, here), and fill in the macro invocation's source location, which is exactly what it is doing now.

If you have a better idea, please suggest :)

So, in the case of the document macro, it should be sufficient to manually do a copy_location from the original Str node to the Assign node. The expander will fix the rest.

P.S. With quasiquotes, the code could look something like:

from mcpyrate.quotes import macros, q, u, n, a

...

    target = q[self.n[doc_id]] if is_attr else q[n[doc_id]]
    with q as quoted:  # block mode, produces a `list`
        a[target] = u[type_hint + s.value.s.rstrip() + post_hint]
    sentences[n] = quoted[0]

It's smart enough to figure out that in sentences[n] you mean the variable n, not the macro - because that n isn't being subscripted, and the n macro is not a @namemacro. Of course, you can as-import the n macro just like in mcpy, if you want it to have a different name.

(And the expander will fill in the missing ctx automatically. Having a ctx is something of a misfeature of Python, since the correct ctx can be decided purely lexically. OTOH, having it can make some analyses simpler.)

set-soft commented 3 years ago

I'll try again, but I remmember trying to copy the location and didn't work. This time I'll have with step_expansion to check.

Technologicat commented 3 years ago

Ok, let me know how it goes.

Looking at a full-module with step_expansion (except imports; regular imports are fine, but macro-imports must be at the top level of the module, not inside a with), there's no obvious reason why the rest of the doc lines would be reported as covered. Puzzling.

By the way, if you want to try the same, pull the latest mcpyrate first - trying this, I found two small bugs that are now fixed.

I would suggest fixing this particular case in document, either by re-enabling the copy_location on line 82, or by changing line 80. Instead of:

            sentences[n] = Assign(targets=[target], value=Str(s=type_hint+s.value.s.rstrip()+post_hint))

we could do:

            str_node = s.value
            str_node.s = type_hint + str_node.s.rstrip() + post_hint
            sentences[n] = Assign(targets=[target], value=str_node)

This recycles the original Str node, so the Str will use the original location information (even though the location information for the Assign will be auto-filled to the macro invocation location). It's also the minimal AST edit that does what we want.

(It's not very FP, but considering Python's AST transformation utilities, that train already sailed. I constantly find myself thinking whether I should tree = ...(tree), or just ...(tree), since most operations - particularly ast.NodeTransformer - edit the original. Would be much easier to program with if they returned a fresh copy.)

Technologicat commented 3 years ago

...aaand I fixed a silly bug in astfixers.fix_missing_locations. Please pull the latest mcpyrate.

Technologicat commented 3 years ago

Tested both of my suggested approaches, both of them give correct line numbers in the AST for me.

The manual copy_location is better in that with step_expansion will show the correct line numbers in its default mode. The correct line number is in a statement node (so step_expansion will begin a new line for it, allowing it to show the line number).

The three-liner also gets the job done, but then you'll have to use with step_expansion["dump"] to see the correct line numbers - they're hidden in the Str node inside the expression, while any generated nodes (such as the Assign) get their location auto-filled, so in the default mode, with step_expansion will show the line number of the macro invocation.

Technologicat commented 3 years ago

And FWIW, assuming I found the right source file, Coverage.py gathers the line numbers from stack frames (f_lineno).

So the line numbers in the coverage data should be as recorded in the bytecode, which means there's one more piece in the coverage reporting puzzle - the Python compiler, which takes the AST and generates bytecode.

But at least it's a reasonable assumption that if the AST nodes don't have sensible line numbers (which is partly the responsibility of the macro expander, and partly of the macros themselves), then neither will the bytecode generated from that AST.

set-soft commented 3 years ago

The good news: I enabled the copy_location and got a correct coverage: https://coveralls.io/builds/34243520/source?filename=kibot/out_bom.py which is very good because to achieve the same using mcpy I had to do some nasty hacks. Now the bad stuff: I tested the speed of a simple task using mcpy and mcpyrate, the script is in experiments/speed. Each iteration took 440 ms using mcpyrate and 230 ms using mcpy a very important difference. From my notes macropy should take around 650 ms.

Technologicat commented 3 years ago

I tried your _import approach and it worked fine.

Puzzled, I found this. Surprisingly Python for a compiled module. So yeah, size is "optional", but not really.

I suppose I'll have to do something about that in path_xstats. Hmm, looking at this, it seems I can just set size as None to tell the system it's not used. But in 3.7 and later only.

In 3.6, it's a different story - that'll crash if I put None there. In 3.6, the key should be missing if the size is not used. Testing on PyPy3 7.3.0, that implementation seems to agree.

This explains why I couldn't reproduce the crash. Both of my pythons are 3.6 (CPython and PyPy3).

Ah, sys.version_info, where would we be without you. Fixed in 439caa7, please update.

Technologicat commented 3 years ago

Nice. Thanks for testing.

Speed: more complex is obviously slower... but it shouldn't be quite that much slower? Let's investigate.

Could you try disabling selected parts of mcpyrate.importer.source_to_xcode, to help narrow this down?

Instead of tree = expand_dialects(data, filename=path), just tree = ast.parse(data). Same effect if no dialects.
Disable the generation of remaining_markers and its check.

Those should tell whether it's the importer.

Note that if you're running with the modified activate.py I suggested here, bytecode caching will be disabled, so runs should be identical.

But if you pull the latest version, and revert activate.py to the version in git, then bytecode caching will be enabled - so further runs, if no macro-dependencies have changed, will simply use the .pyc files from the first run, skipping the whole macro expander. It will still cost a parse to look for macro-imports in the source code - if this is an issue, I could add a cache for that.

There are also some other things that may make mcpyrate slower, that are tradeoffs:

A pass to delete AST markers emitted by the expander itself. Required to support expand-once, and hence step_expansion.
get_macros is slightly more complex than in mcpy, to support relative macro-imports.
Various parts of the expander unparse the input AST into source code just-in-case, for display in error messages. (Search for use sites of unparse_with_fallbacks.)
- The problem here is that macros edit the AST, so in order to be able to show the (unparsed) source code as it appears just before a particular macro expansion, we must unparse the input AST before even trying to apply the macro. Most of the time no macro-expansion error occurs, and the unparsed source code is just discarded.
- Might be possible, if the source location info is accurate, to just show the relevant line from the actual source file. But this would lose the ability to see the exact input AST (right now what you see is exactly what the expander got - which could be an intermediate result after some previous steps of macro expansion).

Would be nice to benchmark this myself, too. unpythonic with mcpyrate is still a long way off (API change, so need to wait for a major revision), so it would be nice to run the KiBot test myself.

I'm not sure I have an environment to actually run KiBot in - does it need anything special, or just python setup.py install?

set-soft commented 3 years ago

I tried https://github.com/Technologicat/mcpyrate/commit/439caa7748f216f0214d71bee8b05731cd485129 so I'm replying to the previous post, not yet the speed stuff, but also related:

Excellent! It works with no changes for Python 3.8.6.
And the cache made the time to drop to 290 ms, quite close to mcpy. Now I'll take a look to your message about speed.

set-soft commented 3 years ago

Ok, now the cache compensates the speed difference. Messing with mcpyrate.importer.source_to_xcode doesn't change anything (runs only once, no?)

About running KiBot: I think the speed test doesn't need too much dependencies. Using the setup.py should pull:

kiauto: a small package I maintain that is available at PyPi
pyyaml: the most popular YAML parser for Pyhon
xlsxwriter: .XLSX writer, available from PyPi

The only annoying thing you could need is KiCad this provides the pcbnew module. If you install it just avoid installing all the libs, in particular the 3D models (they take an insane ammout of disk space: 5 GiB). The application itself isn't huge. Note that the speed test does a very simple task and hence most of the time is spend in loading plug-ins and setting some basic details. This is why the macros spansion dominates the timing.

Technologicat commented 3 years ago

Yes, runs only once, until you touch (or otherwise refresh the modification time of) a source file that uses macros, or delete the .pyc caches manually. Hrm, this might need a small tool to do just that, for testing...

Hmm, 290ms. We could probably go lower, but that adds complexity, and cache invalidation famously being one of the two hard things in CS, there's the risk of shooting ourselves in the foot.

What the .pyc invalidation logic needs is the latest mtime from the macro-dependency graph. So it's really only interested in absolute filenames so it can os.stat them. But those filenames depend, via module name resolution, hence via sys.path, on how the currently running Python is configured.

So I'm thinking a safe thing to do here would be to pickle the AST nodes for the macro-import statements into a cache file - those are a purely lexical thing, and the cache can be easily invalidated based on the mtime of that one source file itself.

That way, the statements could still be parsed dynamically, and the dependencies looked up dynamically; so if sys.path changes between runs, causing a module to be loaded from a different location, it would notice that.

But does this buy us enough speed to be worthwhile? In everyday use of an application that just happens to use macros, probably not; but during agile development, especially if test-driven, it very well might.

I suppose there is just one way to find out. Now, how does one look up the __pycache__ folder for a given source file...?

About what to install from KiCad, that's exactly the kind of important information I was hoping for :)

I'm mainly after something to test the macro-expansion performance with, so that sounds fitting.

Technologicat commented 3 years ago

Ok, I think I'm done for the night. Silly per-source-file cache for scanned macro-import statements, to avoid the parse cost, added in c8c09d9. Slightly updated in 22a63d1.

Might be a bad idea, so care to test in your setup when you have the time? :)

set-soft commented 3 years ago

Tried the last code (3b2fa4651a99db7d85d67cadb8642446b6589913), it works and time goes down to 260 ms.

set-soft commented 3 years ago

I'm experimenting some problems with the cache files.

They aren't really important, but they deserve some notes:

When installing the package at system level the installers (pip, setup.py, dpkg, etc.) creates the cache files calling py3compile. These cache files are wrong because py3compile doesn't know about the macros. What's worst is that these files are "up to date" and can't be removed by the user. I faced this problem using mcpy and my solution was:
- pip: tell the users to pass an argument to prevent it.
- dpkg: include a custom postinst script to generate caches only for the files without macros. I can currently use similar approaches. For dpkg I'm running kibot --help-outputs > /dev/null which is generating all the cache (and pickle) files in a natural way. For pip I can maintain the recomendation to avoid creating the cache files when installing at system level (a rare case, most people using pip installs at user level).
This is more mysterious. I have two obsolete tests for cache problems. They manually create a cache entry for one of the files using macros. The code using mcpy tries to remove the cache file to avoid problems. One of the tests is to verify I can remove the cache file. The other makes the file read-only and checks the error message indicating why we can't run. Both tests runs on my desktop, but fails on GitHub actions. And I can't reproduce the error using my local docker image. I have to investigate why they fail. Isn't very important because this situation isn't real.

set-soft commented 3 years ago

A good news about speed: looking at the coverage output I realized that with mcpy I was using a trick to get more speed.

As I use macros only on the plug-ins I can safetly disable macros after loading the plug-ins. For this I modified mcpy and my code detected if the activate module has a de_activate method. As mcpyrate doesn't have it the code is reported as not covered by any test. I implemented the de_activate in mcpyrate using:

diff --git a/kibot/mcpyrate/activate.py b/kibot/mcpyrate/activate.py
index f1893b8..1ffe261 100644
--- a/kibot/mcpyrate/activate.py
+++ b/kibot/mcpyrate/activate.py
@@ -26,10 +26,23 @@ the `PYTHONDONTWRITEBYTECODE` environment variable, and the attribute
 '''

 from importlib.machinery import SourceFileLoader, FileFinder
-
 from .importer import source_to_xcode, path_xstats, invalidate_xcaches

-SourceFileLoader.source_to_code = source_to_xcode
-# we could replace SourceFileLoader.set_data with a no-op to force-disable pyc caching.
-SourceFileLoader.path_stats = path_xstats
-FileFinder.invalidate_caches = invalidate_xcaches
+
+def activate():
+    SourceFileLoader.source_to_code = source_to_xcode
+    # we could replace SourceFileLoader.set_data with a no-op to force-disable pyc caching.
+    SourceFileLoader.path_stats = path_xstats
+    FileFinder.invalidate_caches = invalidate_xcaches
+
+
+def de_activate():
+    SourceFileLoader.source_to_code = old_source_to_code
+    SourceFileLoader.path_stats = old_path_stats
+    FileFinder.invalidate_caches = old_invalidate_caches
+
+
+old_source_to_code = SourceFileLoader.source_to_code
+old_path_stats = SourceFileLoader.path_stats
+old_invalidate_caches = FileFinder.invalidate_caches
+activate()

Then tried the speed test and got aprox. 160 ms, much better than mcpy. Now the comparisson is fair because I'm disabling macros after loading plug-ins for both tools.

Technologicat commented 3 years ago

Thanks for the heads-up. Something about this needs to go in the README. pip I should be able to figure out, but as for dpkg, even though I've used Debian-based distros for a long time, I haven't made any deb packages myself. Would you happen to have any material that I could format or possibly reword into something appropriate for the mcpyrate documentation?
Please let me know how it goes. I'd really like to not leave in any bugs. :)

Technologicat commented 3 years ago

It actually just crossed my mind a minute ago to ask you if you want to have the deactivate feature, since you added it to mcpy earlier.

Care to officially make a PR so I could pull it in? (Otherwise doesn't matter, but if you do that, GitHub will be able to detect you as a contributor to mcpyrate.)

The only thing I'd change are some names - I'd prefer the prefix stdlib_ instead of old_, and I thinkdeactivate would look better as a single word.

set-soft commented 3 years ago

Care to officially make a PR so I could pull it in? (Otherwise doesn't matter, but if you do that, GitHub will be able to detect you as a contributor to mcpyrate.)

Ok, my 2 cents, I did a pull-request

The only thing I'd change are some names - I'd prefer the prefix stdlib_ instead of old_, and I thinkdeactivate would look better as a single word.

Ok, used these names.

set-soft commented 3 years ago

Thanks for the heads-up. Something about this needs to go in the README. pip I should be able to figure out, but as for dpkg, even though I've used Debian-based distros for a long time, I haven't made any deb packages myself. Would you happen to have any material that I could format or possibly reword into something appropriate for the mcpyrate documentation?

Ok, here is an explanation. Feel free to change as you like, my native language is spanish ;-)

The standard postinst script for Debian packages creates cache files using py3compile (and or pypy3compile). These cache files are invalid because they are compiled without using mcpyrate. This prevents running the installed package. It will report that can't find the macros. To make things worst a regular user won't be able to remove the Python caches, owned by root. In order to fix this problem you must provide a custom postinst script that generates the cache files using mcpyrate. One possible solution is to invoke the script in a way that all, or the major part, of the modules are invoked. This will force the generation of the cache files.

set-soft commented 3 years ago

P.S. With quasiquotes, the code could look something like:

I tried it, but I'm getting:

File "mcpyrate/importer.py", line 37, in source_to_xcode
    return compile(expansion, path, 'exec', dont_inherit=True, optimize=_optimize)
ValueError: expression must have Store context but has Load instead

Technologicat commented 3 years ago

PR merged, as you probably already noticed. :)

Explanation about .deb packaging added to README. Thanks for contributing this, too!

As for the error, if you can isolate a problematic line or two, what does with step_expansion["dump"]: say?

It's clearly some form of assignment, where somehow a Load has ended up in a term on the left-hand side. I could change mcpyrate.astfixers so that it rewrites all ctx, not only missing ones, but let's first try to figure out how the Load got there.

Technologicat commented 3 years ago

Regarding your latest speed test results, it makes sense mcpyrate is faster on average (across several runs) than mcpy, because mcpy does not support bytecode caching. When mcpy is active, .pyc caches won't be written. This was likely done to keep it simple, i.e. to avoid the need for the kind of macro-dependency analysis mcpyrate.importer does.

So, while I would expect mcpyrate's expander to be slightly slower than mcpy's (e.g. due to unparsing AST to source code at every macro expansion, just in case, for error messages), the support for bytecode caching in mcpyrate means we often get to skip running the expander - leading to a lower amortized cost.

EDIT: That said, I'm very impressed at the simplicity and clarity of mcpy. Its source code is textbook-worthy. It gets the main ideas across very efficiently. It's essentially how to write a complete macro expander in less than 1000 lines of Python.

mcpyrate's isn't. The codebase is simply too big, and the additional features, which I feel are worth the complexity cost for real-world use, hide the underlying simplicity so much that the source code no longer counts as an as-simple-as-possible-but-no-simpler textbook explanation.

set-soft commented 3 years ago

As for the error, if you can isolate a problematic line or two, what does with step_expansion["dump"]: say?

I don't get the quasiquotes idea very well, using this:

    target = q[self.n[doc_id]] if is_attr else q[n[doc_id]]
    with q as quoted:  # block mode, produces a `list`
        a[target] = u[type_hint + s.value.s.rstrip() + post_hint]
    sentences[n] = quoted[0]

I get lines like this:

self.n[doc_id] = '[number=1.5] [0,1000] how much the highlight extends around the component [mm]'

With this AST:

   Assign(targets=[Subscript(value=Attribute(value=Name(id='self',
                                                        ctx=Load(),
                                                        lineno=34,
                                                        col_offset=12,
                                                        end_lineno=None,
                                                        end_col_offset=None),
                                             attr='n',
                                             ctx=Load(),
                                             lineno=34,
                                             col_offset=12,
                                             end_lineno=None,
                                             end_col_offset=None),
                             slice=Index(value=Name(id='doc_id',
                                                    ctx=Load(),
                                                    lineno=34,
                                                    col_offset=12,
                                                    end_lineno=None,
                                                    end_col_offset=None)),
                             ctx=Load(),
                             lineno=34,
                             col_offset=12,
                             end_lineno=None,
                             end_col_offset=None)],
          value=Constant(value='[number=1.5] [0,1000] how much the highlight extends around the component [mm]',
                         lineno=34,
                         col_offset=12,
                         end_lineno=None,
                         end_col_offset=None),
          type_comment=None,
          lineno=34,
          col_offset=12,
          end_lineno=None,
          end_col_offset=None)]

Which doesn't make much sense.

The problem is with the use of self.n[doc_id] I tried:

            if is_attr:
                doc_id = 'self.'+doc_id
            target = q[n[doc_id]]

But I get really bizarre errors (can't get the step_expansion for this)

Technologicat commented 3 years ago

I'm sorry, my mistake.

My thinking slipped there: n[] only works with bare names, not attributes. The problem is that the ASTs for x[...] and self.x[...] look very different. Obviously, the latter won't invoke the macro x[] - it's an Attribute and the x is just a bare string in its attr field, while a macro name is a Name.

Quasiquotes - in mcpyrate specifically, or the general idea? The general idea comes from the Lisp family, whereas the variant we have in mcpyrate is based on macropy's Python adaptation of the idea, with some original improvements.

The general idea is, quasiquotes allow writing a code template (in a macro implementation) using mostly the same syntax one would use for normal code. Then, using a special unquote operator, one can interpolate values (that exist at the time the macro is running) into the template.

Lisps need just quasiquote (written as a backtick), unquote (written as a comma) and unquote-splicing (written as ,@), see e.g. The Racket Reference; because in Lisps, there is almost no surface syntax - the source code is a lightly dressed up AST. Python, on the other hand, is rather heavy on surface syntax (see mcpyrate.unparser), so we need more unquote types for different kinds of value interpolation.

With the occasional mistake like the one I just did aside, I've found quasiquotes become second nature when you write a lot of macros. There are some pitfalls to look out for; the one I keep falling into is Python's implicit, invisible ast.Expr node (a.k.a. the expression statement). This is precisely why step_expansion prints those out.

The expression statement problem is not specific to quasiquoted code, but it's easier to run into it when using quasiquotes, because the whole point of quasiquotes is not having to think about how the AST looks (that much).

The idea of the n[] unquote operator - which originally appeared in macropy - is that inside quoted code, it lifts a bare string into a lexical identifier (a.k.a. variable name), so you can do things like:

x = gensym()  # just a bare string!
with q as quoted:
    n[x] = ...

You're seeing bizarre errors, because after 'self.'+doc_id has been evaluated, the string contains a dot, so it's not a valid variable name. Of course, once mcpyrate is finished, it should protect against things like this and give a reasonable error message as early as possible - whereas currently it doesn't. (macropy doesn't, either.)

Unfortunately, we can't detect errors in n[] easily, because n[] expands before the actual value is available. It's just building an AST snippet (that meshes into the context of the surrounding q in just the right way) so that in the final result, an ast.Name node appears, and the string provided by the user goes into its id attribute. It would be possible to delay the AST snippet generation, at the cost of some extra complexity. I'll have to think about this.

(The hygienic unquote h[] already has such a delay, because it must, in order to work at all. I could probably build something similar.)

Sometimes, similar things are better achieved with mcpyrate.utils.rename. Give some dummy name, then rename:

quoted = q[lambda _: 42 * _]
rename("_", gensym(), quoted)

rename renames a lot of things: names, attributes, function parameters, named arguments in calls, imports... so, let's try again:

target = q[self._] if is_attr else q[_]
with q as quoted:
    a[target] = u[type_hint + s.value.s.rstrip() + post_hint]
rename("_", doc_id, quoted)
sentences[n] = quoted[0]

(rename is not yet covered by tests, so though I combed through the GTS docs implementing it, there may still be bugs. It will be covered before 3.0.0 is released. I'm working on expanding the tests.)

I'll also take the opportunity to note that there have been many small improvements in the past day:

unparse and dump both now come with syntax highlighting. The color scheme leaves much to be desired, but there's only so much we can do within the limitations of an ANSI terminal. colorama is now required, to keep mcpyrate OS-agnostic.
The macro-importer skips analysis of .py based modules in Python's standard library. This makes macropython -i boot up faster, and Python's standard report for an uncaught exception also shows up faster (because an uncaught exception triggers the import of a ton of .py based stdlib modules).
I'm experimenting with the idea of fixing all ctx, not only missing ones. This had to go into global_postprocess, because a name macro may appear on the LHS of an assignment, and the local ctx fixer (that fixes each macro expansion) first populates the wrong ctx, since it sees only the Name node.

Technologicat commented 3 years ago

By the way, if you're curious about how q works, the key insights are:

One level of AST-ification goes away when macro expansion is complete and the code is compiled. What was an AST (that was talking about code), is now executable code (actually running code).
It's always ~five-o-clock~ macro-expansion time somewhere. When your macro runs (so it's macro-expansion time for its use site), any macro invocations in its implementation have already expanded, and it's run-time for your macro.
So how do you make q - a macro that lifts its input AST (at macro expansion time) into an AST at run time? You add an extra AST layer.
- Instead of a direct AST, you make an AST that, when compiled and run, re-constructs the original AST it was given as input.
- If the input tree is e.g. ast.Name(id='foo'), the q macro needs to output (roughly speaking) ast.Call(ast.Name, [], [ast.keyword('id', 'foo')]).
So, q returns an AST... or mostly an AST, with some function calls sprinkled in that do something and then return an AST.

Then keep in mind that whatever q returns becomes - because q is a macro - part of the AST of the use site.

set-soft commented 3 years ago

I got it working. After updating the code the Load/Store problem went away. One thing I don't understand: I'm getting step_expansion messages only when the code prints a traceback. Not on normal operation. Why?

Technologicat commented 3 years ago

Nice to hear you got it working.

With the very latest code (just updated), also your first attempt should work. I upgraded n[] so it can make attribute accesses too, so things like n["self." + docid] are now supported. Thanks for the suggestion (which was implicit in how you tried to use it - it's arguably The Expected Thing here).

To be able to manipulate the value, I had to introduce a delay, because the actual value only becomes available after the q macro has returned, and its use site reaches run time. On the other hand, this also allows checking that value, and raising TypeError or ValueError if something is amiss, so now we have error reporting for n[]. On the implementation side, q may now inject one more function call (lift_identifier) whose return value is an AST.

Also, in tonight's edition, step_expansion syntax-highlights macro names, too.

Regarding step_expansion, you're sure the macro expander runs? This can be checked by from mcpyrate.debug import macros, show_bindings, and then put a show_bindings (it's a name macro, so just that) somewhere. If it prints a list of macro bindings to stderr, the expander ran.

But you probably checked that already. The other thing that comes to mind is, step_expansion doesn't flush stderr, so in some configurations, it's not entirely impossible that the messages are left in a buffer that doesn't get flushed until later.

But if the messages are completely missing (never appear though stuff printed later does), then I have no idea, as it works for me.

Could you add an assert False inside step_expansion, and see if it triggers? This would narrow down whether the problem is in step_expansion not getting triggered, or in the way it prints things.

EDIT: by "inside", I mean inside its implementation, function step_expansion in module mcpyrate.debug. Just make it assert False when the function is entered, so we'll see if the macro expander tries to expand it or not.

set-soft commented 3 years ago

With the very latest code (just updated), also your first attempt should work. I upgraded n[] so it can make attribute accesses too, so things like n["self." + docid] are now supported. Thanks for the suggestion (which was implicit in how you tried to use it - it's arguably The Expected Thing here).

:-) I tried it, works very well.

Also, in tonight's edition, step_expansion syntax-highlights macro names, too.

Nice.

Regarding step_expansion, you're sure the macro expander runs? This can be checked by from mcpyrate.debug import macros, show_bindings, and then put a show_bindings (it's a name macro, so just that) somewhere. If it prints a list of macro bindings to stderr, the expander ran.

I found what's the problem, well not exactly the problem, but when it works and when it doesn't. From what I see step_expansion prints something when expanded, after the expansion the results is cached and subsequent runs of the script doesn't print anything.

If I run the code just after a modification the step_expansion and show_bindings works perfectly. But then the cache gets updated, and step_expansion/show_bindings stops working.

My toy test is here: https://github.com/INTI-CMNB/KiBot/tree/quasiquotes/experiments/__doc__/coverage_mcpyrate

BTW: When using the quasiquotes I must do:

for node in walk(tree[n]):
      copy_location(node, s)

I couldn't find it in the standard module. Perhaps mcpyrate could include some copy_location_r helper to achieve it.

A question: I'm trying to use quasiquotes for the other macro, the following implementation works:

def _do_wrap_class_register(tree, mod, base_class):
    if isinstance(tree, ClassDef):
        with q as do_wrap:
            # Import using a function call
            _temp = __import__(u[mod], globals(), locals(), [u[base_class]], 1)
            n[base_class] = n['_temp.'+base_class]
            # Register it
            n[base_class].register(u[tree.name.lower()], n[tree.name])
        return do_wrap[0:2]+[tree]+do_wrap[2:]
    # Just in case somebody applies it to anything other than a class
    return tree  # pragma: no cover

And I was hopping the following also:

def _do_wrap_class_register(tree, mod, base_class):
    if isinstance(tree, ClassDef):
        with q as do_wrap:
            # Import using a function call
            _temp = __import__(u[mod], globals(), locals(), [u[base_class]], 1)
            n[base_class] = n['_temp.'+base_class]
            # The class definition
            a[tree]
            # Register it
            n[base_class].register(u[tree.name.lower()], n[tree.name])
        return do_wrap
    # Just in case somebody applies it to anything other than a class
    return tree  # pragma: no cover

But it doesn't, I get:

  File "/.../kibot/kibot/mcpyrate/importer.py", line 38, in source_to_xcode
    return compile(expansion, path, 'exec', dont_inherit=True, optimize=_optimize)
TypeError: expected some sort of expr, but got <_ast.ClassDef object at 0x7fd1750b0b50>

What's wrong?

Technologicat commented 3 years ago

(Full reply in progress.)

Regarding your edit, you're getting bitten by the invisible Expr "expression statement" node. I've fallen into this trap quite a few times myself.

a[] itself is an expression, and a macro can only replace the AST of its invocation. The problem is, the a[] expression is inside the value field of the invisible Expr node. So it injects the ClassDef into the value field of the Expr node, treating the ClassDef like an expression, which won't work.

But for this, there's mcpyrate.splicing.splice_statements. Something like this should work:

def _do_wrap_class_register(tree, mod, base_class):
    if isinstance(tree, ClassDef):
        with q as do_wrap:
            # Import using a function call
            _temp = __import__(u[mod], globals(), locals(), [u[base_class]], 1)
            n[base_class] = n['_temp.'+base_class]
            # The class definition
            __paste_here__  # noqa: F821
            # Register it
            n[base_class].register(u[tree.name.lower()], n[tree.name])
        splice_statements(tree, do_wrap)
        return do_wrap
    # Just in case somebody applies it to anything other than a class
    return tree  # pragma: no cover

You can change the Name that is used as the target tag; __paste_here__ (literally that!) is just the default.

I briefly considered making a block mode for a so that something like a with a could work, but ran into a different variant of the same invisible Expr node problem. So for now, there's that separate function to paste in statements. :)

EDIT: splice_statements works by looking for a pattern like Expr(Name(id=target_tag)) in the AST, and replacing that whole statement (including the invisible Expr node!) with the given statements. It can do this because it's essentially an AST walker.

EDIT2: And, I could look at it again, I'd also really like to have this feature. The different variant of the issue was, even with the syntax with a: tree, the with body puts tree inside an Expr. A Name is an expression, and here it appears in a statement position, so Python's AST structure then requires it to be inside an Expr.

We could auto-eliminate that Expr, but the thing is, when to do that and when not? At the time the a expands, tree is just a name - there's no value yet to look at. And the run-time part of a can't change the surrounding AST structure.

We could say the block mode with a accepts references to statement nodes only, always auto-eliminating the Expr. But it would have the effect that one could not then pass in expression nodes, such as a function call - and in Python, bare function calls often appear as "statements" (since a function call is an expression, it's actually placed inside an Expr node).

But that's one more rule to remember - while the point of quasiquotes is to simplify macro writing.

EDIT3: No, wait, it might work - the run-time part can check if each item of the input is an instance of ast.stmt or ast.expr. If stmt, then splice in as-is; if expr, then re-wrap with a brand new Expr node. I'll have to try this idea.

Technologicat commented 3 years ago

So, for the other points.

Upgraded n[] further. Adding attribute support led to me noticing that subscripting was missing, so stopping to think about how to do that, and then generalizing further, what n[] should do started to sound awfully lot like a generic Python source code to AST conversion. Which we have in the standard library. So, now n[] is essentially a wrapper for ast.parse in expr mode. It accepts any expression as a string, and splices in the corresponding AST. The main use case is still to make references to variables that have a computed name, but who knows what else it can do now? Seemed the right generalization here.

As for the bytecode cache, the way is supposed to work is that, once the cache is up to date for (for example) application.py, then upon importing application.py, the whole macro expander is skipped for that module. The already macro-expanded cached bytecode is loaded instead. This is the behavior I'm seeing in my own tests, and from the description, it sounds like that's how it's behaving in your tests, too.

Because step_expansion is a macro, it only runs at macro expansion time - so it's completely skipped, if the bytecode cache is up to date. Whether it makes sense for it to behave like that is another question. ;)

(I think it's cleaner like this - but also more difficult to test properly. This solution risks some confusion, but so does the alternative of always printing the debug output, even when the macro expander did not actually run. We just get to pick which kind of confusion is preferable...)

Regarding recursive copy_location, yeah, that loop is the stdlib solution. In mcpyrate, there's mcpyrate.astfixers.fix_locations, which might do what you want, with either mode="overwrite" or mode="reference".

Technically, it's part of the public API, but maybe needs to be more discoverable. I've run into a practical issue in writing documentation - the codebase is already about 5000 lines and has a lot of features, but the documentation needs to be brief for mcpyrate to be easily approachable. As you see, some stuff is still not mentioned, yet the docs should be made shorter! (Or maybe they just need to be structured better.)

(Please note, no guarantees on API stability until I release 3.0.0. The code is getting pretty close to what I need right now, and I don't foresee a need to change that particular function, so it'll probably stay as is - but still, no formal guarantees until I make a PyPI package. Anything that goes onto PyPI comes with the usual semver guarantees.)

set-soft commented 3 years ago

As for the bytecode cache, the way is supposed to work is that, once the cache is up to date for (for example) application.py, then upon importing application.py, the whole macro expander is skipped for that module. The already macro-expanded cached bytecode is loaded instead. This is the behavior I'm seeing in my own tests, and from the description, it sounds like that's how it's behaving in your tests, too.

Yes. But I didn't realize this was the reason until I put a print inside one of my macros ;-)

Technologicat commented 3 years ago

b297842. Not quite sure this is a good idea, but now there's a block mode in a. This should work:

def _do_wrap_class_register(tree, mod, base_class):
    if isinstance(tree, ClassDef):
        with q as do_wrap:
            # Import using a function call
            _temp = __import__(u[mod], globals(), locals(), [u[base_class]], 1)
            n[base_class] = n['_temp.'+base_class]
            # The class definition
            with a:
                tree
            # Register it
            n[base_class].register(u[tree.name.lower()], n[tree.name])
        return do_wrap
    # Just in case somebody applies it to anything other than a class
    return tree  # pragma: no cover

Also fixed a bug while at it: astify needs to support Done markers so that things like coverage dummy nodes and expanded name macros are treated correctly by q.

EDIT: One more important thing: block mode with a only takes statement nodes. Above, tree can be a single statement node or a list of them, but statements only. (Other kinds of inputs likely cause a mysterious compile error.)

Expr mode a[] takes an expression node, as before.

Technologicat commented 3 years ago

As for the bytecode cache, the way is supposed to work is that, once the cache is up to date for (for example) application.py, then upon importing application.py, the whole macro expander is skipped for that module. The already macro-expanded cached bytecode is loaded instead. This is the behavior I'm seeing in my own tests, and from the description, it sounds like that's how it's behaving in your tests, too.

Yes. But I didn't realize this was the reason until I put a print inside one of my macros ;-)

Ok. Might need to emphasize this in the docs.

Technologicat commented 3 years ago

Please update, tonight's edition (b72015e) adds error handling for a. This should pretty much eliminate mysterious compile errors due to accidentally incorrect use of a.

The block mode of a was also upgraded to become more robust - it should work at any indentation level in the quoted AST now. (E.g. inside an if branch - the previous prototype only worked properly at the top level of the with q.)

Now the system yells if an invocation of a[] is trying to splice in a statement node, or if an invocation of with a is trying to splice in an expression node. Also in both modes, if trying to splice in something that isn't an AST.

Note in with a, each item in the block body may evaluate to a statement AST node, or to a list of statement AST nodes.

The checks occur at the run time of the use site of q - which means, at the macro-expansion time of your app, when your own macro (that uses q) runs. This is the earliest the checks can be performed, because they depend on values (the actual AST nodes being spliced in) that only become available at that time.

I'll update the docs later.

Technologicat commented 3 years ago

Docs are now up to date, too.

Because the original issue, as well as any new ones that came up during the discussion, have been fixed, I'm closing this now. Thanks for the testing help and suggestions :)

On my part, mcpyrate is approaching a releasable 3.0.0. There's only one feature I might still cram in. (Although, it's something that requires some thinking. We'll see if it makes the cut.) Then it's time to polish the docs, package this up, and push to PyPI.

Feel free to open a new issue if you run into any more problems, or if there's something you want to suggest.

Technologicat commented 3 years ago

Just for the record:

Yes, runs only once, until you touch (or otherwise refresh the modification time of) a source file that uses macros, or delete the .pyc caches manually. Hrm, this might need a small tool to do just that, for testing...

Such a tool has been added in f7484bc, see clearcaches.py at the top level. Use carefully. ;)

(For 3.1.0, I'm still weighing whether this should be an installed script; if so, it will likely move into some location inside the mcpyrate package, but if not, it will stay at the top level.)