Seeking opinion on adding "external" declarations to M2l

jonahbeckford commented 3 months ago

Today M2l is focused on the module sublanguage. That makes sense for a dependency analyzer because the module sublanguage expresses dependencies between OCaml compilation units.

However, the external keyword introduces a dependency to C compilation units. I'd like to add the C dependencies to M2l for several reasons:

Consistency. C and OCaml dependencies are both dependencies.
Security. For security analysis, I'd like to assert if a module is "safe". That is, can I run that untrusted module without a sandbox? That is relatively easy to answer if the untrusted module only accesses a known list of safe modules and has no external declarations.
C Linking. On Windows the C linking model is still broken (especially bytecode / toplevel): https://github.com/ocaml/ocaml/issues/12412, https://github.com/ocaml/dune/issues/10042, etc. IMHO the core problem is that every build system + package maintainer adds their own linker flags that get passed along in .cma (etc.) files ... and they are all making an ELF-centric flat namespace assumption that only linker flags are required to bind correctly to libraries ... and forgetting that two-level namespace shared libraries exist (Windows DLLs and Mach-O dylibs). On Windows the sane way to load a plugin is to specify the first level of the two-level namespace by executing a LoadLibrary function call ... and that needs C glue code. I'd like to explore using dependency analysis to discover what the linker flags and what the (C) glue code would be. That approach might even remove the need for ctypes in simple cases.

Thoughts?

This is not urgent, and I wouldn't be starting it tomorrow :)

Octachron commented 3 months ago

Computing an over-approximation (counting every external that is defined and might be linked) sounds like a good idea, since externals are a form of dependency (on a flat external namespace).

Contrarily, computing exact dependencies for bytecode would require to implement name resolution down to the term-level (excluding type-directed disambiguation), which goes far beyond the scope of a module level language.

The first option sounds sensible, but I fear that the second one would go counter to idea of codept of computing as little as possible to extract the graph of dependency, and deferring everything else to the dependency graph.

jonahbeckford commented 3 months ago

Okay, that is perfect. I had no intention of getting codept to compute exact dependencies. Just the union of all externals (which is the first option) is all that is needed.

Thanks!

Octachron / codept

Seeking opinion on adding "external" declarations to M2l #37