Implement implicit DSDL compilation via import hooks

pavel-kirienko commented 3 years ago

Instead of this:

import pyuavcan
compiled_dsdl_dir = pathlib.Path(__file__).resolve().parent / ".demo_dsdl_compiled"
sys.path.insert(0, str(compiled_dsdl_dir))
try:
    import sirius_cyber_corp
    import pyuavcan.application
except (ImportError, AttributeError):
    src_dir = pathlib.Path(__file__).resolve().parent
    pyuavcan.dsdl.compile_all(
        [
            src_dir / "custom_data_types/sirius_cyber_corp",
            src_dir / "public_regulated_data_types/uavcan/",
        ],
        output_directory=compiled_dsdl_dir,
    )
    importlib.invalidate_caches()
    import sirius_cyber_corp
    import pyuavcan.application

We could just do this:

import pyuavcan
import sirius_cyber_corp
import pyuavcan.application

Such that if the DSDL namespaces are not compiled, PyUAVCAN would invoke the compiler transparently for the user. This is implementable with the help of Python import hooks. The Cap'n'Proto Python client implements it this way.

We could define an environment variable named simply CYPHALPATH listing a set of paths where DSDL namespaces are to be searched for. When an import hook is triggered, PyUAVCAN would scan the paths looking for the matching namespace. If found, it would invoke the DSDL compiler, providing all other directories reachable via the path variable as the look-up paths. The output should be cached somewhere secure to avoid recompilation at every launch (/tmp is probably not an option for security reasons, but the home directory might work).

pavel-kirienko commented 2 years ago

@chemicstry continuing the conversation here. So the way I see it is that we set up a hook for ImportError and check there if there is a directory anywhere under CYPHALPATH whose name matches the name of the missing import. If there is, we collect all of the namespace directories from every location listed in CYPHALPATH, and pass them all to pycyphal.dsdl.compile_all. We can't compile namespaces one by one because they may be interdependent. I am not yet sure where to save the output, any ideas here?

chemicstry commented 2 years ago

Looks good, however, I think there are some tricky edge cases.

If CYPHALPATH has a default value, when not defined, then there is issue with adding additional paths. For example, user just installed pycyphal, added public_regulated_data_types to ~/.cyphal and everything works fine. However, then he decides to add custom DSDL, sets CYPHALPATH=/home/user/my_company_dsdl and suddenly regulated types are missing. I see a couple options here:

Always add default paths to CYPHALPATH
Have a separate CYPHAL_ADDITIONAL_PATHS variable for any custom DSDL
On first launch set CYPHALPATH to default location if it is not defined. On windows this might work, but on linux permanently setting env variables is a bit tricky.
Do not use list in CYPHALPATH, limit to a single path only

I think limiting CYPHALPATH to single path is the best option here and maybe consider adding CYPHAL_ADDITIONAL_PATHS if neccessary.

The compilation output directory. I think it should be separate from CYPHALPATH, because CYPHALPATH might be used by other implementations (C, Rust, etc) and it should only contain DSDLs. Maybe an additional PYCYPHAL_PATH variable, which defaults to ~/.pycyphal? Another option is to put everything under subfolders in CYPHALPATH, like: ~/.cyphal/dsdl and ~/.cyphal/pycyphal_compiled. I would probably prefer the PYCYPHAL_PATH variable.

And finally I think it is worth thinking how we handle DSDL updates so that they are recompiled when files are changed. Compiling on each startup would probably be too much overhead? I'm not sure how reliable it is to use FS modification dates, but we could generate a hash of all DSDL files and store it in the output directory. If hashes do no match - recompile.

pavel-kirienko commented 2 years ago

Always add default paths to CYPHALPATH

This doesn't sound too bad but I imagine it might become troublesome if one desires to purposefully isolate the current environment from the system-wide default configuration. Say, if I have a special one-off script that needs to use some modified DSDL namespaces, I would want to temporarily set CYPHALPATH to include only my special paths. With a permanent default that would be impossible.

Have a separate CYPHAL_ADDITIONAL_PATHS variable for any custom DSDL

I don't like this because it promotes the default to a special status, which risks creating wrong ideas about the public regulated data types namespace (which is likely to be found in the default location). Also, and perhaps most importantly, this approach differs from commonly acceptable practices (see PATH, PYTHONPATH, etc).

On first launch set CYPHALPATH to default location if it is not defined. On windows this might work, but on linux permanently setting env variables is a bit tricky.

Reconfiguring the environment like that is out of the scope of PyCyphal (remember we're not talking about Yakut here) but it shouldn't be an issue for the user to do it manually if needed. Higher-level tools like Yakut could still check if CYPHALPATH is configured, and if not, do it automatically for the user. Configuring this manually is not too taxing for the user anyway so we could start small and then add automation later. I think of the presented options this is the best solution.

Do not use list in CYPHALPATH, limit to a single path only I think limiting CYPHALPATH to single path is the best option here and maybe consider adding CYPHAL_ADDITIONAL_PATHS if neccessary.

I don't think these are good ideas because they introduce more entities (= higher complexity) and differ significantly from common practices.

Let us proceed with a multi-entry PYCYPHAL with no default value other than, perhaps, the current working directory, similar to PYTHONPATH, and let us adjust UX later by building on top of this (perhaps by adding UX tweaks to Yakut like appending ~/.bashrc automatically).

The compilation output directory. I think it should be separate from CYPHALPATH, because CYPHALPATH might be used by other implementations (C, Rust, etc) and it should only contain DSDLs.

Agreed.

Maybe an additional PYCYPHAL_PATH variable, which defaults to ~/.pycyphal? Another option is to put everything under subfolders in CYPHALPATH, like: ~/.cyphal/dsdl and ~/.cyphal/pycyphal_compiled. I would probably prefer the PYCYPHAL_PATH variable.

Let's go with PYCYPHAL_PATH (or PYCYPHALPATH?) at least for now because it seems to be the most obvious approach with no clear contenders. Although piggybacking on CYPHALPATH is also possible without even the need to put DSDL namespaces into subdirectories because we could always store our PyCyphal-specific outputs into some cleverly named subdirectory whose name is not a valid DSDL namespace name, like .pycyphal.

And finally I think it is worth thinking how we handle DSDL updates so that they are recompiled when files are changed. Compiling on each startup would probably be too much overhead? I'm not sure how reliable it is to use FS modification dates, but we could generate a hash of all DSDL files and store it in the output directory. If hashes do no match - recompile.

Reliance on FS modification timestamps seems to be working for build systems so why not also use it here. The main problem I see is that we will need to traverse all DSDL files to see if they need to be recompiled, which may be slow in certain cases (esp. with HDD), but then hashing would be even worse in this regard. I suggest we start without recompilation (which would necessitate doing rm -rf $PYCYPHAL_PATH to force updates) and revisit this later.

pavel-kirienko commented 2 years ago

Hey @chemicstry, no pressure but are you still interested/able to help us with this?

chemicstry commented 2 years ago

Hey, yes, but I can't give an ETA when. Maybe a month. Feel free to assign it to anyone else if it can be done sooner.

pavel-kirienko commented 2 years ago

Hey @chemicstry how's it looking now? :D

OpenCyphal / pycyphal

Implement implicit DSDL compilation via import hooks #153