Closed set-soft closed 3 years ago
Hi!
Thanks for trying, so I'm not the only one testing this ;)
It should support absolute paths. Even if I start python
(or the macropython
wrapper) with a relative path, the path has become absolute by the time it reaches mcpyrate.importer.source_to_xcode
.
I'll have to try with an adapted version of your _import
function, to see if it's something in manually importing the module with the low-level functions from importlib
that's causing the issue. That's not too different from how the macropython
wrapper does it, though, so at first glance, I'm puzzled.
From the exception, it sounds the likely immediate culprit is mcpyrate.importer.path_xstats
. When computing the mtime for a source file, I'm intentionally leaving out the size. The latest docs said size
is an optional field, so I thought that shouldn't break anything on 3.6 through 3.8.
If this is the problem, I could sum the sizes of the files in the considered set, and return that as the size? I'm not sure if that makes sense, though, since size changes to different files might balance out - which could be a disaster if something is using the size for cache invalidation. (This is why there currently is no size
.)
To test whether this is the problem, what happens if you replace mcpyrate/activate.py
with this?
from importlib.machinery import SourceFileLoader
from .importer import source_to_xcode
def nop(*args, **kwargs):
pass
SourceFileLoader.source_to_code = source_to_xcode
SourceFileLoader.set_data = nop
(Adapted from mcpy
; setting SourceFileLoader.set_data
to a no-op will disable the writing of .pyc
caches. Not customizing SourceFileLoader.path_stats
, in turn, disables the customized .pyc
cache invalidation.)
Works!
It enabled me running the tests. I disabled all the code in the macros used to assign line numbers.
The decorator macros shown correct coverage. But I 'm having problems with my block macro: last line doesn't show as covered.
Here is the branch of my code where I'm testing mcpyrate
: https://github.com/INTI-CMNB/KiBot/tree/try_mcpyrate
And here is a line reported as not covered: https://coveralls.io/builds/34238330/source?filename=kibot/out_bom.py#L45
This docstring is replaced by an Assign
and is clearly covered.
Ok, thanks for testing. Then I need to do something about the .pyc
cache invalidation. It seems size
is mandatory even though the docs state otherwise. OTOH, we're monkey-patching SourceFileLoader
(like mcpy
does) instead of defining our own loader from scratch like macropy
does, so maybe if one replaces too many methods that way, internal implementation assumptions start getting in the way. :P
Also, I just realized that the modules you're loading might not be under any directory in sys.path
- hence spec_from_file_location
. Cool. I never considered that case.
What are the exact semantics? How do modules loaded that way import other modules from their location? Do they have to use _import
, too?
Then, back to the coverage reporting issue - could you try:
from mcpyrate.debug import macros, step_expansion
with step_expansion:
... # your code invoking the block macro here
It should print out line numbers and unparsed source code (to sys.stderr
) at each step of the macro expansion. Does anything look suspicious there?
And what's on the last line that's being reported as not covered? Can we be sure the Python compiler isn't optimizing it away?
(In my tests, I've noticed Python 3.6 optimizes away pass
statements, and any Expr
with just a Constant
inside it. Hence, the dummy nodes I'm adding for coverage currently use an Assign
, so that the compiler can't assume it's a no-op.)
Ah, I saw your edit. Thanks. I'll take a look.
(Still curious about those imports, though.)
Ok, I grabbed a copy, quickly patched kibot/out_bom.py
to use a with step_expansion
, and ran macropython -m kibot.out_bom
to expand the macros. Result:
**Tree 0x7fb7160bda60 before macro expansion:
L 42 with document:
L 43 self.field = ''
L 44 $Expr: ' Name of the field to use for this column '
L 45 self.name = ''
L 46 $Expr: ' Name to display in the header. The field is used when empty '
L 47 self.join = Optionable
L 48 $Expr: " [list(string)|string=''] List of fields to join to this column "
**Tree 0x7fb7160bda60 after step 1:
L ---- $ASTMarker<Done>:
L 42 _mcpyrate_coverage = 'source line 42 invoked macro document'
L 43 self.field = ''
L 42 self._help_field = "[string=''] Name of the field to use for this column"
L 45 self.name = ''
L 42 self._help_name = "[string=''] Name to display in the header. The field is used when empty"
L 47 self.join = Optionable
L 42 self._help_join = " [list(string)|string=''] List of fields to join to this column"
**Tree 0x7fb7160bda60 macro expansion complete after 1 step.
and indeed, using with step_expansion["dump"]
confirms that this:
Expr(value=Str(s=" [list(string)|string=''] List of fields to join to this column ",
lineno=48,
col_offset=16),
lineno=48,
col_offset=16)],
becomes this:
Assign(targets=[Attribute(value=Name(id='self',
ctx=Load(),
lineno=42,
col_offset=12),
attr='_help_join',
ctx=Store(),
lineno=42,
col_offset=12)],
value=Str(s=" [list(string)|string=''] List of fields to join to this column",
lineno=42,
col_offset=12),
lineno=42,
col_offset=12)
so yes, seems there's a bug in the code that fills in the line numbers.
The Assign
should get 42
, because it's added by the macro, but it shouldn't be changing the line number of the Str
that already exists in the original unexpanded code. Arrr!
The plug-ins are imported with _import
.
They can include any functionality from the main project using .module
or kibot.module
. The relative imports works fine. I think this is because Python knows where kibot
is and that the default is kibot
. So other modules are included in a natural way.
The code in the plug-ins is never called in a direct way.
The plug-ins are registered (by the decorator macro) and then the main code can look for the correct plug-in in the registered list and get the class to instantiate a proper object.
with step_expansion
is really nice, I tried it.
Thanks. Let me know if there's something that could be improved in it ;)
Thanks for the plug-in explanation.
To my understanding, relative import works because the call to spec_from_file_location
in your _import
function sets name
as kibot.something
. This will make the something
plugin think it's in the kibot
package. Cool trick.
(The documentation for that function is... somewhat spartan. Emacs's anaconda-mode
says the implementation of spec_from_file_location
comes from a compiled module, so no Alt+.
for me - to be completely certain, I'd have to dig in the CPython sources on GitHub.)
Absolute import works, if a kibot
exists at the top level under one of the directories in sys.path
. This will be the case if you python -m
a module from the shell at the project's top level (without installing), because when python -m ...
, Python will implicitly add the current working directory to sys.path
. This has even changed in 3.7. Previously, Python added the magic ""
entry meaning whatever is the current cwd at the moment, but if I'm reading the issue report right, 3.7 fixed that misfeature, so 3.7 and later instead use the fully resolved cwd from interpreter startup time.
An absolute import lookup will succeed also if the kibot
package has been installed, because the site-packages
directory where installations go (for whichever Python is running the code) will be on sys.path
.
And now I see the issue. Looking at kibot.macros.document
, it's replacing the original Str
node with a new one. This is causing mcpyrate
to detect the new node as macro-generated, so that node - correctly, I'd say - gets assigned the line number of the macro invocation.
What I don't get is why the other doc lines are being reported as covered, because they're getting exactly the same treatment, and with step_expansion["dump"]
indeed confirms this. There should be no node with that line number in the output, so it should be reported as not covered. Why it isn't... currently I have no idea. Maybe I should with step_expansion
the whole file, to see if something else is acting up.
(Note that the default, unparse output mode of step_expansion
only shows line number for statements, because in Python those typically begin a new line, whereas expressions don't. To see the line numbers for individual expressions, one has to use the dump mode, which is very verbose.)
So, turns out there's one more corner case I didn't think of - a macro that so completely rewrites its input so that from a particular line, no original AST nodes remain in the output.
I'm tempted to make it the macro's responsibility to fill in the source location correctly in such cases - it's the pythonic way, plus I think there can be no generally applicable rule. As the macro expander sees it, there's simply a missing source location in a node in the transformed tree. So the only thing it can do simply and reliably is to assume that the node was macro-generated (as it indeed was, here), and fill in the macro invocation's source location, which is exactly what it is doing now.
If you have a better idea, please suggest :)
So, in the case of the document
macro, it should be sufficient to manually do a copy_location
from the original Str
node to the Assign
node. The expander will fix the rest.
P.S. With quasiquotes, the code could look something like:
from mcpyrate.quotes import macros, q, u, n, a
...
target = q[self.n[doc_id]] if is_attr else q[n[doc_id]]
with q as quoted: # block mode, produces a `list`
a[target] = u[type_hint + s.value.s.rstrip() + post_hint]
sentences[n] = quoted[0]
It's smart enough to figure out that in sentences[n]
you mean the variable n
, not the macro - because that n
isn't being subscripted, and the n
macro is not a @namemacro
. Of course, you can as-import the n
macro just like in mcpy
, if you want it to have a different name.
(And the expander will fill in the missing ctx
automatically. Having a ctx
is something of a misfeature of Python, since the correct ctx
can be decided purely lexically. OTOH, having it can make some analyses simpler.)
I'll try again, but I remmember trying to copy the location and didn't work.
This time I'll have with step_expansion
to check.
Ok, let me know how it goes.
Looking at a full-module with step_expansion
(except imports; regular imports are fine, but macro-imports must be at the top level of the module, not inside a with
), there's no obvious reason why the rest of the doc lines would be reported as covered. Puzzling.
By the way, if you want to try the same, pull the latest mcpyrate
first - trying this, I found two small bugs that are now fixed.
I would suggest fixing this particular case in document
, either by re-enabling the copy_location
on line 82, or by changing line 80. Instead of:
sentences[n] = Assign(targets=[target], value=Str(s=type_hint+s.value.s.rstrip()+post_hint))
we could do:
str_node = s.value
str_node.s = type_hint + str_node.s.rstrip() + post_hint
sentences[n] = Assign(targets=[target], value=str_node)
This recycles the original Str
node, so the Str
will use the original location information (even though the location information for the Assign
will be auto-filled to the macro invocation location). It's also the minimal AST edit that does what we want.
(It's not very FP, but considering Python's AST transformation utilities, that train already sailed. I constantly find myself thinking whether I should tree = ...(tree)
, or just ...(tree)
, since most operations - particularly ast.NodeTransformer
- edit the original. Would be much easier to program with if they returned a fresh copy.)
...aaand I fixed a silly bug in astfixers.fix_missing_locations
. Please pull the latest mcpyrate
.
Tested both of my suggested approaches, both of them give correct line numbers in the AST for me.
The manual copy_location
is better in that with step_expansion
will show the correct line numbers in its default mode. The correct line number is in a statement node (so step_expansion
will begin a new line for it, allowing it to show the line number).
The three-liner also gets the job done, but then you'll have to use with step_expansion["dump"]
to see the correct line numbers - they're hidden in the Str
node inside the expression, while any generated nodes (such as the Assign
) get their location auto-filled, so in the default mode, with step_expansion
will show the line number of the macro invocation.
And FWIW, assuming I found the right source file, Coverage.py
gathers the line numbers from stack frames (f_lineno
).
So the line numbers in the coverage data should be as recorded in the bytecode, which means there's one more piece in the coverage reporting puzzle - the Python compiler, which takes the AST and generates bytecode.
But at least it's a reasonable assumption that if the AST nodes don't have sensible line numbers (which is partly the responsibility of the macro expander, and partly of the macros themselves), then neither will the bytecode generated from that AST.
The good news: I enabled the copy_location
and got a correct coverage: https://coveralls.io/builds/34243520/source?filename=kibot/out_bom.py which is very good because to achieve the same using mcpy
I had to do some nasty hacks.
Now the bad stuff: I tested the speed of a simple task using mcpy
and mcpyrate
, the script is in experiments/speed
. Each iteration took 440 ms using mcpyrate
and 230 ms using mcpy
a very important difference. From my notes macropy
should take around 650 ms.
I tried your _import
approach and it worked fine.
Puzzled, I found this. Surprisingly Python for a compiled module. So yeah, size
is "optional", but not really.
I suppose I'll have to do something about that in path_xstats
. Hmm, looking at this, it seems I can just set size
as None
to tell the system it's not used. But in 3.7 and later only.
In 3.6, it's a different story - that'll crash if I put None
there. In 3.6, the key should be missing if the size is not used. Testing on PyPy3 7.3.0, that implementation seems to agree.
This explains why I couldn't reproduce the crash. Both of my pythons are 3.6 (CPython and PyPy3).
Ah, sys.version_info
, where would we be without you. Fixed in 439caa7, please update.
Nice. Thanks for testing.
Speed: more complex is obviously slower... but it shouldn't be quite that much slower? Let's investigate.
Could you try disabling selected parts of mcpyrate.importer.source_to_xcode
, to help narrow this down?
tree = expand_dialects(data, filename=path)
, just tree = ast.parse(data)
. Same effect if no dialects.remaining_markers
and its check.Those should tell whether it's the importer.
Note that if you're running with the modified activate.py
I suggested here, bytecode caching will be disabled, so runs should be identical.
But if you pull the latest version, and revert activate.py
to the version in git, then bytecode caching will be enabled - so further runs, if no macro-dependencies have changed, will simply use the .pyc
files from the first run, skipping the whole macro expander. It will still cost a parse to look for macro-imports in the source code - if this is an issue, I could add a cache for that.
There are also some other things that may make mcpyrate
slower, that are tradeoffs:
step_expansion
.get_macros
is slightly more complex than in mcpy
, to support relative macro-imports.unparse_with_fallbacks
.)
Would be nice to benchmark this myself, too. unpythonic
with mcpyrate
is still a long way off (API change, so need to wait for a major revision), so it would be nice to run the KiBot test myself.
I'm not sure I have an environment to actually run KiBot in - does it need anything special, or just python setup.py install
?
I tried https://github.com/Technologicat/mcpyrate/commit/439caa7748f216f0214d71bee8b05731cd485129 so I'm replying to the previous post, not yet the speed stuff, but also related:
mcpy
.
Now I'll take a look to your message about speed.Ok, now the cache compensates the speed difference. Messing with mcpyrate.importer.source_to_xcode
doesn't change anything (runs only once, no?)
About running KiBot: I think the speed test doesn't need too much dependencies. Using the setup.py
should pull:
The only annoying thing you could need is KiCad this provides the pcbnew
module. If you install it just avoid installing all the libs, in particular the 3D models (they take an insane ammout of disk space: 5 GiB). The application itself isn't huge.
Note that the speed test does a very simple task and hence most of the time is spend in loading plug-ins and setting some basic details. This is why the macros spansion dominates the timing.
Yes, runs only once, until you touch
(or otherwise refresh the modification time of) a source file that uses macros, or delete the .pyc
caches manually. Hrm, this might need a small tool to do just that, for testing...
Hmm, 290ms. We could probably go lower, but that adds complexity, and cache invalidation famously being one of the two hard things in CS, there's the risk of shooting ourselves in the foot.
What the .pyc
invalidation logic needs is the latest mtime from the macro-dependency graph. So it's really only interested in absolute filenames so it can os.stat
them. But those filenames depend, via module name resolution, hence via sys.path
, on how the currently running Python is configured.
So I'm thinking a safe thing to do here would be to pickle the AST nodes for the macro-import statements into a cache file - those are a purely lexical thing, and the cache can be easily invalidated based on the mtime of that one source file itself.
That way, the statements could still be parsed dynamically, and the dependencies looked up dynamically; so if sys.path
changes between runs, causing a module to be loaded from a different location, it would notice that.
But does this buy us enough speed to be worthwhile? In everyday use of an application that just happens to use macros, probably not; but during agile development, especially if test-driven, it very well might.
I suppose there is just one way to find out. Now, how does one look up the __pycache__
folder for a given source file...?
About what to install from KiCad, that's exactly the kind of important information I was hoping for :)
I'm mainly after something to test the macro-expansion performance with, so that sounds fitting.
Ok, I think I'm done for the night. Silly per-source-file cache for scanned macro-import statements, to avoid the parse cost, added in c8c09d9. Slightly updated in 22a63d1.
Might be a bad idea, so care to test in your setup when you have the time? :)
Tried the last code (3b2fa4651a99db7d85d67cadb8642446b6589913), it works and time goes down to 260 ms.
I'm experimenting some problems with the cache files.
They aren't really important, but they deserve some notes:
When installing the package at system level the installers (pip, setup.py, dpkg, etc.) creates the cache files calling py3compile
. These cache files are wrong because py3compile
doesn't know about the macros. What's worst is that these files are "up to date" and can't be removed by the user.
I faced this problem using mcpy
and my solution was:
pip
: tell the users to pass an argument to prevent it.dpkg
: include a custom postinst
script to generate caches only for the files without macros.
I can currently use similar approaches. For dpkg
I'm running kibot --help-outputs > /dev/null
which is generating all the cache (and pickle) files in a natural way. For pip
I can maintain the recomendation to avoid creating the cache files when installing at system level (a rare case, most people using pip
installs at user level).This is more mysterious. I have two obsolete tests for cache problems. They manually create a cache entry for one of the files using macros. The code using mcpy
tries to remove the cache file to avoid problems. One of the tests is to verify I can remove the cache file. The other makes the file read-only and checks the error message indicating why we can't run. Both tests runs on my desktop, but fails on GitHub actions. And I can't reproduce the error using my local docker image. I have to investigate why they fail. Isn't very important because this situation isn't real.
A good news about speed: looking at the coverage output I realized that with mcpy
I was using a trick to get more speed.
As I use macros only on the plug-ins I can safetly disable macros after loading the plug-ins. For this I modified mcpy
and my code detected if the activate
module has a de_activate
method. As mcpyrate
doesn't have it the code is reported as not covered by any test. I implemented the de_activate
in mcpyrate
using:
diff --git a/kibot/mcpyrate/activate.py b/kibot/mcpyrate/activate.py
index f1893b8..1ffe261 100644
--- a/kibot/mcpyrate/activate.py
+++ b/kibot/mcpyrate/activate.py
@@ -26,10 +26,23 @@ the `PYTHONDONTWRITEBYTECODE` environment variable, and the attribute
'''
from importlib.machinery import SourceFileLoader, FileFinder
-
from .importer import source_to_xcode, path_xstats, invalidate_xcaches
-SourceFileLoader.source_to_code = source_to_xcode
-# we could replace SourceFileLoader.set_data with a no-op to force-disable pyc caching.
-SourceFileLoader.path_stats = path_xstats
-FileFinder.invalidate_caches = invalidate_xcaches
+
+def activate():
+ SourceFileLoader.source_to_code = source_to_xcode
+ # we could replace SourceFileLoader.set_data with a no-op to force-disable pyc caching.
+ SourceFileLoader.path_stats = path_xstats
+ FileFinder.invalidate_caches = invalidate_xcaches
+
+
+def de_activate():
+ SourceFileLoader.source_to_code = old_source_to_code
+ SourceFileLoader.path_stats = old_path_stats
+ FileFinder.invalidate_caches = old_invalidate_caches
+
+
+old_source_to_code = SourceFileLoader.source_to_code
+old_path_stats = SourceFileLoader.path_stats
+old_invalidate_caches = FileFinder.invalidate_caches
+activate()
Then tried the speed test and got aprox. 160 ms, much better than mcpy
. Now the comparisson is fair because I'm disabling macros after loading plug-ins for both tools.
pip
I should be able to figure out, but as for dpkg
, even though I've used Debian-based distros for a long time, I haven't made any deb packages myself. Would you happen to have any material that I could format or possibly reword into something appropriate for the mcpyrate
documentation?It actually just crossed my mind a minute ago to ask you if you want to have the deactivate feature, since you added it to mcpy
earlier.
Care to officially make a PR so I could pull it in? (Otherwise doesn't matter, but if you do that, GitHub will be able to detect you as a contributor to mcpyrate
.)
The only thing I'd change are some names - I'd prefer the prefix stdlib_
instead of old_
, and I thinkdeactivate
would look better as a single word.
Care to officially make a PR so I could pull it in? (Otherwise doesn't matter, but if you do that, GitHub will be able to detect you as a contributor to
mcpyrate
.)
Ok, my 2 cents, I did a pull-request
The only thing I'd change are some names - I'd prefer the prefix
stdlib_
instead ofold_
, and I thinkdeactivate
would look better as a single word.
Ok, used these names.
- Thanks for the heads-up. Something about this needs to go in the README.
pip
I should be able to figure out, but as fordpkg
, even though I've used Debian-based distros for a long time, I haven't made any deb packages myself. Would you happen to have any material that I could format or possibly reword into something appropriate for themcpyrate
documentation?
Ok, here is an explanation. Feel free to change as you like, my native language is spanish ;-)
The standard postinst
script for Debian packages creates cache files using py3compile
(and or pypy3compile
). These cache files are invalid because they are compiled without using mcpyrate
. This prevents running the installed package. It will report that can't find the macros
. To make things worst a regular user won't be able to remove the Python caches, owned by root. In order to fix this problem you must provide a custom postinst
script that generates the cache files using mcpyrate
. One possible solution is to invoke the script in a way that all, or the major part, of the modules are invoked. This will force the generation of the cache files.
P.S. With quasiquotes, the code could look something like:
I tried it, but I'm getting:
File "mcpyrate/importer.py", line 37, in source_to_xcode
return compile(expansion, path, 'exec', dont_inherit=True, optimize=_optimize)
ValueError: expression must have Store context but has Load instead
PR merged, as you probably already noticed. :)
Explanation about .deb packaging added to README. Thanks for contributing this, too!
As for the error, if you can isolate a problematic line or two, what does with step_expansion["dump"]:
say?
It's clearly some form of assignment, where somehow a Load
has ended up in a term on the left-hand side. I could change mcpyrate.astfixers
so that it rewrites all ctx
, not only missing ones, but let's first try to figure out how the Load
got there.
Regarding your latest speed test results, it makes sense mcpyrate
is faster on average (across several runs) than mcpy
, because mcpy
does not support bytecode caching. When mcpy
is active, .pyc
caches won't be written. This was likely done to keep it simple, i.e. to avoid the need for the kind of macro-dependency analysis mcpyrate.importer
does.
So, while I would expect mcpyrate
's expander to be slightly slower than mcpy
's (e.g. due to unparsing AST to source code at every macro expansion, just in case, for error messages), the support for bytecode caching in mcpyrate
means we often get to skip running the expander - leading to a lower amortized cost.
EDIT: That said, I'm very impressed at the simplicity and clarity of mcpy
. Its source code is textbook-worthy. It gets the main ideas across very efficiently. It's essentially how to write a complete macro expander in less than 1000 lines of Python.
mcpyrate
's isn't. The codebase is simply too big, and the additional features, which I feel are worth the complexity cost for real-world use, hide the underlying simplicity so much that the source code no longer counts as an as-simple-as-possible-but-no-simpler textbook explanation.
As for the error, if you can isolate a problematic line or two, what does with step_expansion["dump"]: say?
I don't get the quasiquotes idea very well, using this:
target = q[self.n[doc_id]] if is_attr else q[n[doc_id]]
with q as quoted: # block mode, produces a `list`
a[target] = u[type_hint + s.value.s.rstrip() + post_hint]
sentences[n] = quoted[0]
I get lines like this:
self.n[doc_id] = '[number=1.5] [0,1000] how much the highlight extends around the component [mm]'
With this AST:
Assign(targets=[Subscript(value=Attribute(value=Name(id='self',
ctx=Load(),
lineno=34,
col_offset=12,
end_lineno=None,
end_col_offset=None),
attr='n',
ctx=Load(),
lineno=34,
col_offset=12,
end_lineno=None,
end_col_offset=None),
slice=Index(value=Name(id='doc_id',
ctx=Load(),
lineno=34,
col_offset=12,
end_lineno=None,
end_col_offset=None)),
ctx=Load(),
lineno=34,
col_offset=12,
end_lineno=None,
end_col_offset=None)],
value=Constant(value='[number=1.5] [0,1000] how much the highlight extends around the component [mm]',
lineno=34,
col_offset=12,
end_lineno=None,
end_col_offset=None),
type_comment=None,
lineno=34,
col_offset=12,
end_lineno=None,
end_col_offset=None)]
Which doesn't make much sense.
The problem is with the use of self.n[doc_id]
I tried:
if is_attr:
doc_id = 'self.'+doc_id
target = q[n[doc_id]]
But I get really bizarre errors (can't get the step_expansion for this)
I'm sorry, my mistake.
My thinking slipped there: n[]
only works with bare names, not attributes. The problem is that the ASTs for x[...]
and self.x[...]
look very different. Obviously, the latter won't invoke the macro x[]
- it's an Attribute
and the x
is just a bare string in its attr
field, while a macro name is a Name
.
Quasiquotes - in mcpyrate
specifically, or the general idea? The general idea comes from the Lisp family, whereas the variant we have in mcpyrate
is based on macropy
's Python adaptation of the idea, with some original improvements.
The general idea is, quasiquotes allow writing a code template (in a macro implementation) using mostly the same syntax one would use for normal code. Then, using a special unquote
operator, one can interpolate values (that exist at the time the macro is running) into the template.
Lisps need just quasiquote
(written as a backtick), unquote
(written as a comma) and unquote-splicing
(written as ,@
), see e.g. The Racket Reference; because in Lisps, there is almost no surface syntax - the source code is a lightly dressed up AST. Python, on the other hand, is rather heavy on surface syntax (see mcpyrate.unparser
), so we need more unquote types for different kinds of value interpolation.
With the occasional mistake like the one I just did aside, I've found quasiquotes become second nature when you write a lot of macros. There are some pitfalls to look out for; the one I keep falling into is Python's implicit, invisible ast.Expr
node (a.k.a. the expression statement). This is precisely why step_expansion
prints those out.
The expression statement problem is not specific to quasiquoted code, but it's easier to run into it when using quasiquotes, because the whole point of quasiquotes is not having to think about how the AST looks (that much).
The idea of the n[]
unquote operator - which originally appeared in macropy
- is that inside quoted code, it lifts a bare string into a lexical identifier (a.k.a. variable name), so you can do things like:
x = gensym() # just a bare string!
with q as quoted:
n[x] = ...
You're seeing bizarre errors, because after 'self.'+doc_id
has been evaluated, the string contains a dot, so it's not a valid variable name. Of course, once mcpyrate
is finished, it should protect against things like this and give a reasonable error message as early as possible - whereas currently it doesn't. (macropy
doesn't, either.)
Unfortunately, we can't detect errors in n[]
easily, because n[]
expands before the actual value is available. It's just building an AST snippet (that meshes into the context of the surrounding q
in just the right way) so that in the final result, an ast.Name
node appears, and the string provided by the user goes into its id
attribute. It would be possible to delay the AST snippet generation, at the cost of some extra complexity. I'll have to think about this.
(The hygienic unquote h[]
already has such a delay, because it must, in order to work at all. I could probably build something similar.)
Sometimes, similar things are better achieved with mcpyrate.utils.rename
. Give some dummy name, then rename
:
quoted = q[lambda _: 42 * _]
rename("_", gensym(), quoted)
rename
renames a lot of things: names, attributes, function parameters, named arguments in calls, imports... so, let's try again:
target = q[self._] if is_attr else q[_]
with q as quoted:
a[target] = u[type_hint + s.value.s.rstrip() + post_hint]
rename("_", doc_id, quoted)
sentences[n] = quoted[0]
(rename
is not yet covered by tests, so though I combed through the GTS docs implementing it, there may still be bugs. It will be covered before 3.0.0 is released. I'm working on expanding the tests.)
I'll also take the opportunity to note that there have been many small improvements in the past day:
unparse
and dump
both now come with syntax highlighting. The color scheme leaves much to be desired, but there's only so much we can do within the limitations of an ANSI terminal. colorama
is now required, to keep mcpyrate
OS-agnostic.
The macro-importer skips analysis of .py
based modules in Python's standard library. This makes macropython -i
boot up faster, and Python's standard report for an uncaught exception also shows up faster (because an uncaught exception triggers the import of a ton of .py
based stdlib modules).
I'm experimenting with the idea of fixing all ctx
, not only missing ones. This had to go into global_postprocess
, because a name macro may appear on the LHS of an assignment, and the local ctx
fixer (that fixes each macro expansion) first populates the wrong ctx
, since it sees only the Name
node.
By the way, if you're curious about how q
works, the key insights are:
q
- a macro that lifts its input AST (at macro expansion time) into an AST at run time? You add an extra AST layer.
ast.Name(id='foo')
, the q
macro needs to output (roughly speaking) ast.Call(ast.Name, [], [ast.keyword('id', 'foo')])
.q
returns an AST... or mostly an AST, with some function calls sprinkled in that do something and then return an AST.Then keep in mind that whatever q
returns becomes - because q
is a macro - part of the AST of the use site.
I got it working.
After updating the code the Load/Store problem went away.
One thing I don't understand: I'm getting step_expansion
messages only when the code prints a traceback. Not on normal operation. Why?
Nice to hear you got it working.
With the very latest code (just updated), also your first attempt should work. I upgraded n[]
so it can make attribute accesses too, so things like n["self." + docid]
are now supported. Thanks for the suggestion (which was implicit in how you tried to use it - it's arguably The Expected Thing here).
To be able to manipulate the value, I had to introduce a delay, because the actual value only becomes available after the q
macro has returned, and its use site reaches run time. On the other hand, this also allows checking that value, and raising TypeError
or ValueError
if something is amiss, so now we have error reporting for n[]
. On the implementation side, q
may now inject one more function call (lift_identifier
) whose return value is an AST.
Also, in tonight's edition, step_expansion
syntax-highlights macro names, too.
Regarding step_expansion
, you're sure the macro expander runs? This can be checked by from mcpyrate.debug import macros, show_bindings
, and then put a show_bindings
(it's a name macro, so just that) somewhere. If it prints a list of macro bindings to stderr, the expander ran.
But you probably checked that already. The other thing that comes to mind is, step_expansion
doesn't flush stderr
, so in some configurations, it's not entirely impossible that the messages are left in a buffer that doesn't get flushed until later.
But if the messages are completely missing (never appear though stuff printed later does), then I have no idea, as it works for me.
Could you add an assert False
inside step_expansion
, and see if it triggers? This would narrow down whether the problem is in step_expansion
not getting triggered, or in the way it prints things.
EDIT: by "inside", I mean inside its implementation, function step_expansion
in module mcpyrate.debug
. Just make it assert False
when the function is entered, so we'll see if the macro expander tries to expand it or not.
With the very latest code (just updated), also your first attempt should work. I upgraded
n[]
so it can make attribute accesses too, so things liken["self." + docid]
are now supported. Thanks for the suggestion (which was implicit in how you tried to use it - it's arguably The Expected Thing here).
:-) I tried it, works very well.
Also, in tonight's edition,
step_expansion
syntax-highlights macro names, too.
Nice.
Regarding
step_expansion
, you're sure the macro expander runs? This can be checked byfrom mcpyrate.debug import macros, show_bindings
, and then put ashow_bindings
(it's a name macro, so just that) somewhere. If it prints a list of macro bindings to stderr, the expander ran.
I found what's the problem, well not exactly the problem, but when it works and when it doesn't. From what I see step_expansion
prints something when expanded, after the expansion the results is cached and subsequent runs of the script doesn't print anything.
If I run the code just after a modification the step_expansion
and show_bindings
works perfectly. But then the cache gets updated, and step_expansion
/show_bindings
stops working.
My toy test is here: https://github.com/INTI-CMNB/KiBot/tree/quasiquotes/experiments/__doc__/coverage_mcpyrate
BTW: When using the quasiquotes I must do:
for node in walk(tree[n]):
copy_location(node, s)
I couldn't find it in the standard module. Perhaps mcpyrate
could include some copy_location_r
helper to achieve it.
A question: I'm trying to use quasiquotes for the other macro, the following implementation works:
def _do_wrap_class_register(tree, mod, base_class):
if isinstance(tree, ClassDef):
with q as do_wrap:
# Import using a function call
_temp = __import__(u[mod], globals(), locals(), [u[base_class]], 1)
n[base_class] = n['_temp.'+base_class]
# Register it
n[base_class].register(u[tree.name.lower()], n[tree.name])
return do_wrap[0:2]+[tree]+do_wrap[2:]
# Just in case somebody applies it to anything other than a class
return tree # pragma: no cover
And I was hopping the following also:
def _do_wrap_class_register(tree, mod, base_class):
if isinstance(tree, ClassDef):
with q as do_wrap:
# Import using a function call
_temp = __import__(u[mod], globals(), locals(), [u[base_class]], 1)
n[base_class] = n['_temp.'+base_class]
# The class definition
a[tree]
# Register it
n[base_class].register(u[tree.name.lower()], n[tree.name])
return do_wrap
# Just in case somebody applies it to anything other than a class
return tree # pragma: no cover
But it doesn't, I get:
File "/.../kibot/kibot/mcpyrate/importer.py", line 38, in source_to_xcode
return compile(expansion, path, 'exec', dont_inherit=True, optimize=_optimize)
TypeError: expected some sort of expr, but got <_ast.ClassDef object at 0x7fd1750b0b50>
What's wrong?
(Full reply in progress.)
Regarding your edit, you're getting bitten by the invisible Expr
"expression statement" node. I've fallen into this trap quite a few times myself.
a[]
itself is an expression, and a macro can only replace the AST of its invocation. The problem is, the a[]
expression is inside the value
field of the invisible Expr
node. So it injects the ClassDef
into the value
field of the Expr
node, treating the ClassDef
like an expression, which won't work.
But for this, there's mcpyrate.splicing.splice_statements
. Something like this should work:
def _do_wrap_class_register(tree, mod, base_class):
if isinstance(tree, ClassDef):
with q as do_wrap:
# Import using a function call
_temp = __import__(u[mod], globals(), locals(), [u[base_class]], 1)
n[base_class] = n['_temp.'+base_class]
# The class definition
__paste_here__ # noqa: F821
# Register it
n[base_class].register(u[tree.name.lower()], n[tree.name])
splice_statements(tree, do_wrap)
return do_wrap
# Just in case somebody applies it to anything other than a class
return tree # pragma: no cover
You can change the Name
that is used as the target tag; __paste_here__
(literally that!) is just the default.
I briefly considered making a block mode for a
so that something like a with a
could work, but ran into a different variant of the same invisible Expr
node problem. So for now, there's that separate function to paste in statements. :)
EDIT: splice_statements
works by looking for a pattern like Expr(Name(id=target_tag))
in the AST, and replacing that whole statement (including the invisible Expr
node!) with the given statements. It can do this because it's essentially an AST walker.
EDIT2: And, I could look at it again, I'd also really like to have this feature. The different variant of the issue was, even with the syntax with a: tree
, the with
body puts tree
inside an Expr
. A Name
is an expression, and here it appears in a statement position, so Python's AST structure then requires it to be inside an Expr
.
We could auto-eliminate that Expr
, but the thing is, when to do that and when not? At the time the a
expands, tree
is just a name - there's no value yet to look at. And the run-time part of a
can't change the surrounding AST structure.
We could say the block mode with a
accepts references to statement nodes only, always auto-eliminating the Expr
. But it would have the effect that one could not then pass in expression nodes, such as a function call - and in Python, bare function calls often appear as "statements" (since a function call is an expression, it's actually placed inside an Expr
node).
But that's one more rule to remember - while the point of quasiquotes is to simplify macro writing.
EDIT3: No, wait, it might work - the run-time part can check if each item of the input is an instance of ast.stmt
or ast.expr
. If stmt
, then splice in as-is; if expr
, then re-wrap with a brand new Expr
node. I'll have to try this idea.
So, for the other points.
Upgraded n[]
further. Adding attribute support led to me noticing that subscripting was missing, so stopping to think about how to do that, and then generalizing further, what n[]
should do started to sound awfully lot like a generic Python source code to AST conversion. Which we have in the standard library. So, now n[]
is essentially a wrapper for ast.parse
in expr
mode. It accepts any expression as a string, and splices in the corresponding AST. The main use case is still to make references to variables that have a computed name, but who knows what else it can do now? Seemed the right generalization here.
As for the bytecode cache, the way is supposed to work is that, once the cache is up to date for (for example) application.py
, then upon importing application.py
, the whole macro expander is skipped for that module. The already macro-expanded cached bytecode is loaded instead. This is the behavior I'm seeing in my own tests, and from the description, it sounds like that's how it's behaving in your tests, too.
Because step_expansion
is a macro, it only runs at macro expansion time - so it's completely skipped, if the bytecode cache is up to date. Whether it makes sense for it to behave like that is another question. ;)
(I think it's cleaner like this - but also more difficult to test properly. This solution risks some confusion, but so does the alternative of always printing the debug output, even when the macro expander did not actually run. We just get to pick which kind of confusion is preferable...)
Regarding recursive copy_location
, yeah, that loop is the stdlib solution. In mcpyrate
, there's mcpyrate.astfixers.fix_locations
, which might do what you want, with either mode="overwrite"
or mode="reference"
.
Technically, it's part of the public API, but maybe needs to be more discoverable. I've run into a practical issue in writing documentation - the codebase is already about 5000 lines and has a lot of features, but the documentation needs to be brief for mcpyrate
to be easily approachable. As you see, some stuff is still not mentioned, yet the docs should be made shorter! (Or maybe they just need to be structured better.)
(Please note, no guarantees on API stability until I release 3.0.0. The code is getting pretty close to what I need right now, and I don't foresee a need to change that particular function, so it'll probably stay as is - but still, no formal guarantees until I make a PyPI package. Anything that goes onto PyPI comes with the usual semver guarantees.)
As for the bytecode cache, the way is supposed to work is that, once the cache is up to date for (for example)
application.py
, then upon importingapplication.py
, the whole macro expander is skipped for that module. The already macro-expanded cached bytecode is loaded instead. This is the behavior I'm seeing in my own tests, and from the description, it sounds like that's how it's behaving in your tests, too.
Yes. But I didn't realize this was the reason until I put a print inside one of my macros ;-)
b297842. Not quite sure this is a good idea, but now there's a block mode in a
. This should work:
def _do_wrap_class_register(tree, mod, base_class):
if isinstance(tree, ClassDef):
with q as do_wrap:
# Import using a function call
_temp = __import__(u[mod], globals(), locals(), [u[base_class]], 1)
n[base_class] = n['_temp.'+base_class]
# The class definition
with a:
tree
# Register it
n[base_class].register(u[tree.name.lower()], n[tree.name])
return do_wrap
# Just in case somebody applies it to anything other than a class
return tree # pragma: no cover
Also fixed a bug while at it: astify
needs to support Done
markers so that things like coverage dummy nodes and expanded name macros are treated correctly by q
.
EDIT: One more important thing: block mode with a
only takes statement nodes. Above, tree
can be a single statement node or a list
of them, but statements only. (Other kinds of inputs likely cause a mysterious compile error.)
Expr mode a[]
takes an expression node, as before.
As for the bytecode cache, the way is supposed to work is that, once the cache is up to date for (for example)
application.py
, then upon importingapplication.py
, the whole macro expander is skipped for that module. The already macro-expanded cached bytecode is loaded instead. This is the behavior I'm seeing in my own tests, and from the description, it sounds like that's how it's behaving in your tests, too.Yes. But I didn't realize this was the reason until I put a print inside one of my macros ;-)
Ok. Might need to emphasize this in the docs.
Please update, tonight's edition (b72015e) adds error handling for a
. This should pretty much eliminate mysterious compile errors due to accidentally incorrect use of a
.
The block mode of a
was also upgraded to become more robust - it should work at any indentation level in the quoted AST now. (E.g. inside an if
branch - the previous prototype only worked properly at the top level of the with q
.)
Now the system yells if an invocation of a[]
is trying to splice in a statement node, or if an invocation of with a
is trying to splice in an expression node. Also in both modes, if trying to splice in something that isn't an AST.
Note in with a
, each item in the block body may evaluate to a statement AST node, or to a list of statement AST nodes.
The checks occur at the run time of the use site of q
- which means, at the macro-expansion time of your app, when your own macro (that uses q
) runs. This is the earliest the checks can be performed, because they depend on values (the actual AST nodes being spliced in) that only become available at that time.
I'll update the docs later.
Docs are now up to date, too.
Because the original issue, as well as any new ones that came up during the discussion, have been fixed, I'm closing this now. Thanks for the testing help and suggestions :)
On my part, mcpyrate
is approaching a releasable 3.0.0. There's only one feature I might still cram in. (Although, it's something that requires some thinking. We'll see if it makes the cut.) Then it's time to polish the docs, package this up, and push to PyPI.
Feel free to open a new issue if you run into any more problems, or if there's something you want to suggest.
Just for the record:
Yes, runs only once, until you touch (or otherwise refresh the modification time of) a source file that uses macros, or delete the .pyc caches manually. Hrm, this might need a small tool to do just that, for testing...
Such a tool has been added in f7484bc, see clearcaches.py
at the top level. Use carefully. ;)
(For 3.1.0, I'm still weighing whether this should be an installed script; if so, it will likely move into some location inside the mcpyrate
package, but if not, it will stay at the top level.)
Hi!
I'm trying
mcpyrate
as you suggested in here. I'm putting this here to avoid pollutingcoveragepy
issues.I use the macros to simplify the code for my application plug-ins. So I need to expand the macros in files loaded from an specific absolute path. With
mcpy
I'm using the following import routine:Where
kibot
is the name of my app (and modules root),name
is the name of the plug-in andpath
is the absolute path to the file I want to load askibot.name
.Using
mcpyrate
I get the following error:It looks like
get_macros
doesn't support an absolute path infilename
. I need to use absolute path imports, the plug-ins can be from many places in the filesystem, and the user could be trying to overwrite the behavior of an internal plug-in. I remmember trying to mess withsys.path
without luck.I'll try to see
get_macros
to find a solution.