bpcreech / PyMiniRacer

PyMiniRacer is a V8 bridge in Python.
https://blog.sqreen.com/embedding-javascript-into-python/
ISC License
122 stars 6 forks source link

Can it support Inspector protocol? #63

Open Taiung opened 6 months ago

Taiung commented 6 months ago

I want to debug when python executes js code. I checked some documents of V8, but I am not familiar with C++ and cannot implement this function myself. So, I would like to ask if this feature can be added in subsequent updates.

bpcreech commented 6 months ago

Hey, interesting idea!

The v8 inspector protocol is pretty extensive! I wonder if you could describe what kinds of things you want to do with it? (E.g., print values, change values, pause execution, profile memory, ...) Also, what kind of interface were you thinking of (E.g., just expose a JSON sendInspectorMessage and onInspectorMessage, or something more user-friendly?)

It might be noteworthy that as of PyMiniRacer v0.12.0, you can now make callbacks from JavaScript to Python, which enables crude "print debugging", like this:

$ python
>>> from py_mini_racer import MiniRacer
>>> ctx = MiniRacer()
>>> async def log(s):
...   print(s)
...
>>> async def run_my_code():
...   async with ctx.wrap_py_function(log) as log_js:
...     ctx.eval('this')['log'] = log_js
...     ctx.eval('for (let i = 0; i < 10; i++) { log(i); }')
...
>>> import asyncio
>>> asyncio.run(run_my_code())
0
1
2
3
4
5
...
Taiung commented 4 months ago

I'm very sorry that I'm replying to your message just now. What I hope is that I can debug the js code with the help of Chrome's DevTools, and I can pause the program where needed, similar to Pycharm's debugging mode.

I describe it like this, I don't know if it is clear. This feature is great for developers, and I hope you can adopt this suggestion.

christian2022 commented 4 months ago

@bpcreech I have looked at your example and have the problem that the wrapped function is not called immediately but only when the async contextmanager exits:

$ python
>>> from py_mini_racer import MiniRacer
>>> ctx = MiniRacer()
>>> async def log(s):
...   print(s)
...
>>> async def run_my_code():
...   async with ctx.wrap_py_function(log) as log_js:
...     ctx.eval('this')['log'] = log_js
...     print('before loop')
...     ctx.eval('for (let i = 0; i < 10; i++) { log(i); }')
...     print('after loop')
...   print('after async with')
...
>>> import asyncio
>>> asyncio.run(run_my_code())
before loop
after loop
0
1
2
...
9
after async with

The expectation would be that after loop would be printed after 9 and not before 0. I assume these callbacks are somehow stuck in a loop and get executed when the context is cleaned up. Is that a bug or am I overseeing something?

bpcreech commented 4 months ago

Ah, hmm, I think that's either a bug or expectation gap in the code. Because the JS code doesn't await the return value of log, it's basically a race condition if the thing completes or not. If you add await asyncio.sleep(0.5) before the async contextmanager exits, it does print in the expected order.

If we wanted more deterministic logging we'd need to do something like:

import asyncio
from py_mini_racer import MiniRacer
ctx = MiniRacer()
async def log(s):
  print(s)

async def run_my_code():
  async with ctx.wrap_py_function(log) as log_js:
    ctx.eval('this')['log'] = log_js
    print('before loop')
    await ctx.eval('Promise.all(Array(10).keys().map(i => log(i)))')
    print('after loop')
  print('after async with')

asyncio.run(run_my_code())
christian2022 commented 4 months ago

Not sure if Promise.all will even ensure processing in proper sequence. But anyways if you want to bind e.g. console.log to python (or any other sync function) you cannot use an async function as sync-over-async is bound to fail. So to make that work properly I think two wrap_py_functions are needed - for async and sync.

bpcreech commented 4 months ago

The sequence isn't relevant in this example since the values are ignored. The purpose here is simply to ensure the function is completed.

You are right that this is not a drop-in replacement for console.log. it is, however, a functioning way to get logs!

The trouble with a sync wrap_py_function is that it would very quickly deadlock when the wrapped function attempted to call back into v8 for anything including even unpacking an object or array.

christian2022 commented 4 months ago

The reason I came here was, because I was trying to intercept document.write. I'm parsing web pages with BeautifulSoup and execute the script nodes with MiniRacer. As soon as one of the scripts use document.write, I'd have to adapt the document object in python and execute added script nodes. But these need to be executed before the next script in the original document, so I cannot wait until the async contextmanager exits its scope. Any idea?

bpcreech commented 4 months ago

Neat use case! You are sort of building a headless web browser. :)

So we want to serialize operations here while avoiding the world of multithreaded recursion... IIUC we want our Python outer loop (which is running BeautifulSoup) to ensure any calls to document.write have completed before it continues to the next part of the document. Is it possible to create a JS function which calls the Python document writer and stores the promise in a JS global, and use that function as document.write? Then the Python outer loop can find that global and await all the promises in it before proceeding to processing the next HTML fragment.