Open CMCDragonkai opened 1 year ago
I think the most promising route to achieve this is to look at Electron's build of nodejs. And try to bring that into our build system. It will be important to then align the build of both the Polykey-CLI and Polykey-Desktop along the same node runtime if we want to do this.
@tegefaulkes another interesting idea is to create WS library like how quiche works, and write all the WS stuff in JS, and abstract out to allow generic IO and generic TLS/crypto.
Some notes on how to actually combine Node native plugins together: https://chat.openai.com/share/d09826e1-ebb0-4584-9e89-d379ac7363b8
The trick is like this:
require
to load the native libraries in-order. They are dynamically loaded.-l
to dynamically link the first library to the second library. This ensures that the second library's code will be calling to symbols that are exported by the first library.dlopen
will ask the dynamic linker if the same path has already been loaded. This may be challenging. Not sure how to solve this yet.You said earlier that require
would end calling dlopen
which would check if a path to a shared object has already been loaded before and thus avoid double loading the same library. Here's a problem, if the shared object (dynamically linked library) is distributed via npm, they would get installed under the node_modules
. Now the problem is that the project itself may exist at random paths. So the full absolute path to the node_modules
would be different depending on the project, and therefore the full absolute path to the shared object would also be different at each time. Now because node_modules
are technically flattened so as long as all dependencies load the same npm package version then all these paths would be the same, but it's possible that some dependencies may have a different package version, so then a different shared object might be loaded. Furthermore, during dynamic linking, do we have nesure that the -I
path is the absolute path? How does loading the second library find the symbols of the first library?
Your observations are correct. In the context of Node.js and npm, the path to a native addon (or any package) can indeed vary between installations, as packages are installed under
node_modules
, and the project's location can be different on different systems. As such, different versions of the same library may get loaded due to version discrepancies in dependencies, which can lead to symbol conflicts and other issues.Here's how the mechanism would work in general:
Loading the Libraries: When your Node.js script calls
require
on the first and then the second library, these libraries are loaded into memory. Node.js uses thedlopen
function (or equivalent, depending on the platform) to do this. As you correctly pointed out,dlopen
uses the absolute path to the library as the identifier to ensure that the same library isn't loaded multiple times. However, in this case, "the same library" means a library at the same path, not a library with the same content or the same version. This is why it's possible to end up with multiple versions of the same library loaded at the same time if they're located at different paths.Including Headers: When you compile the second library, you include the first library's headers using the
-I
flag to specify the path to these headers. This is necessary for the compiler to verify that the second library's code is calling the first library's functions correctly. However, it doesn't actually affect the runtime behavior of the code. The compiler doesn't need to know where the first library's shared object file will be at runtime.Resolving Symbols: When the second library is loaded, it needs to resolve the symbols (functions, variables, etc.) that it references from the first library. The dynamic linker handles this by searching through the libraries that have already been loaded into memory (by previous calls to
dlopen
) to find these symbols. It doesn't matter where on disk these libraries came from; what matters is that they're already loaded into memory.As for handling different versions of the same library, it is indeed a complex issue. It's always best to ensure that all parts of your application are using the same version of any shared libraries to avoid inconsistencies and potential conflicts. npm's dependency resolution algorithm generally tries to install the latest version that satisfies all dependencies, but this can sometimes lead to multiple versions being installed if different parts of your application require incompatible versions. If you have control over all parts of your application, it would be best to ensure that they all depend on the same version of the shared library.
In general, managing dependencies, particularly native dependencies, can be one of the more complex aspects of software development. It requires careful design and planning to ensure that all parts of your application are compatible and can work together correctly.
What exactly happens if 2 native libraries were to be loaded into memory, and both exposed the same function name (symbol)?
If two native libraries are loaded into memory and they both expose the same function name, what happens depends on how these libraries are loaded and how the function is called.
If you load the two libraries in a Node.js application using
require
, each library will be loaded into its own isolated context. When you call a function from one of these libraries, Node.js will call the correct version of the function based on whichrequire
'd object you're using to access the function. This is because Node.js keeps track of the bindings between JavaScript objects and native libraries, so there's no confusion about which library a function call should go to.This is a key benefit of using
require
to load native libraries in Node.js -- it isolates libraries from each other, so there's less risk of conflicts between them. It's similar to how JavaScript modules each have their own scope, and variables defined in one module don't interfere with variables in another module.However, if you're dealing with libraries that are loaded and called directly from C or C++ code, it's possible to have conflicts if different libraries define a function with the same name. The behavior in this case depends on the specifics of the dynamic linker and how it's configured.
By default, when a program or library calls a function from a shared library, the dynamic linker searches through the loaded libraries in the order they were loaded, and it uses the first matching function it finds. So, if two libraries define a function with the same name, the version of the function in the library that was loaded first will be called.
Some systems also provide ways to control symbol resolution more precisely. For example, on Linux, you can use the
RTLD_LOCAL
flag withdlopen
to specify that a library's symbols should not be made available to subsequently loaded libraries. Conversely, theRTLD_GLOBAL
flag specifies that a library's symbols should be made available to subsequently loaded libraries. You can also use version scripts and other linker options to control symbol visibility and resolution.In general, though, it's best to avoid having multiple libraries that define the same symbols, if possible. This can cause confusion and lead to bugs that are difficult to diagnose. If you're writing a library that's intended to be used as a shared library, it's a good practice to use unique names for your exported functions (such as by including your library's name as a prefix) to reduce the risk of conflicts.
So basically if I had dependencies relying on different versions of the same npm package, and that npm package exposed a native shared object that was loaded into memory. Then if a third dependency with its own native object were to call C function that was exposed by different versions of the same npm package, then the resolution will pick the C function that was exposed by whichever shared object was loaded first.
Yes, your understanding is correct. If you had multiple versions of the same shared library loaded into memory (each one from a different npm package) and a third native module tried to use a function that is defined in that shared library, the dynamic linker would resolve the function call to the version of the function in the library that was loaded first. This behavior is a result of how the dynamic linker handles symbol resolution.
The order in which libraries are loaded can be important in this situation. When Node.js loads native addons using
require
, it loads them in the order they arerequire
d in the JavaScript code. Therefore, if your JavaScript code first requires the npm package that includes the first version of the shared library, and then requires the npm package that includes the second version of the shared library, the first version of the shared library will be loaded into memory first, and its symbols will be used for resolution.As a result, if the two versions of the shared library have different implementations of the same function, the version from the first-loaded library will be used, which might not be what you expect. This could potentially lead to subtle bugs if the different versions of the function have different behaviors.
It's generally best to ensure that all parts of your application are using the same version of any shared libraries to avoid this kind of issue. If that's not possible, you'll need to carefully manage the order in which libraries are loaded and be aware of how this affects symbol resolution.
Some notes on the standardising the API for custom TLS verification: https://github.com/MatrixAI/Polykey/issues/551#issuecomment-1730732555
Linear marked this as stale so I'm reopening.
Specification
The current situation in Polykey with TLS will involve lots of TLS libraries. This is not as secure as it can be. It's better to centralise the TLS libraries to 1 BoringSSL library. This simplifies how we expect the TLS system to operate, such as dependencies on operating system CA certificates, and having to only update 1 TLS library for PK and monitoring security vulnerabilities to that TLS library, and being able to independently update that TLS library without updating the Node runtime... etc.
This requires:
fetch
and https related modules rely on Node's tls module, instead of replacingfetch
andhttps
to use BoringSSL, we could override thetls
module with a customtls
module that has the same API but instead uses the boringssl code.To make 1. possible, this would mean that websockets can be generic to the underlying socket IO, that makes it similar to quiche, and the whole thing can be just written in JS. In fact if a websocket library was generic to the underlying socket IO and underlying crypto library, that would be best.
To make 2. possible, I'm not even sure if this is possible. Node's native addons seem to be all designed to be statically linked objects that only dynamically link to object code that already exists in node's executable. There's no documentation on how a native addon can dynamically link to another shared object. Or how 2 native addons could share a common native library.
To make 3. possible, this primarily deals with the fact that we don't have a generic HTTP/HTTPS library that is generic to the underlying socket IO. It's also the fact that other mobile platforms may implement
fetch
but with different underlying systems. So I'm not really sure here.One possibility is to look at Electron's node. https://www.electronjs.org/blog/electron-internals-using-node-as-a-library
They have managed to compile nodejs as a shared object, and then swap out its underlying openssl to boringssl. There might be more flexibility if we can copy how electron project builds nodejs to use... and maybe we will have a better way of bundling it as well in
pkg
. Doing so will however change how we expect to test things ifnode
is not what we do to run PK, but instead our own custom node.Additional context
422
155
503
234
Tasks