Open butlerpd opened 2 years ago
As a 99% solution we could ship signed precompiled models. Then only the 1% of users who use custom C models will need to find workarounds such as turning on GPU or white-listing the plugins directory in the AV.
Please check if the generic scattering calculator works on systems with this AV. It uses the numba JIT compiler to build and run the scattering calculation. If this fails, then these routines may need to be precompiled and signed rather than relying on the JIT compiler.
If numba works, then perhaps we can use the full llvm to JIT compile a module that we can call from python. Or maybe use ply (c preprocessor), pycparser (c99 parser) and llvmlite to have our own JIT C compiler. The docs say numba can target nvidia and radeon for GPU. Nothing for intel or mac M1 yet so this removes OpenCL from the mix.
TLDR; still no good answer, but if we don't mind some bloat we can probably implement an on-the-fly C compiler so the AV system doesn't see the files.
I played with cppyy to compile and run a c module on the fly:
import cppyy
cppyy.cppdef("#include <cmath>\ndouble addsin(double a, double b) { return sin(a+b); }")
cppyy.gbl.addsin(1.0, 2.0)
This uses cling as a backend, which is a C++ interpreter based on clang (a replacement for the cint interpreter in the CERN Root project). It's kind of heavy (125 MB), it's more than we need (full C++) and we already ship llvmlite as part of numba, however it should be easy to get it working with sasmodels, so a quick solution .
ppci is an entirely separate toolchain, including front ends for various languages and backends for various architectures. I got it to compile and run simple functions. I didn't sort out linking to the math library, but it includes the source to a math library in one of its examples. This is another relatively quick solution, but it is labeled as alpha quality.
I played with libclang hoping to skip cling but it only goes as far as the parse tree (AST), and doesn't have a translation to LLVM Intermediate Representation (IR) that llvmlite needs for Just in Time (JIT) compile. Using clang+llvm as a JIT doesn't take much code (see here). I haven't looked at the code in detail, but if it could be modified to return the IR that we then feed into llvmlite then it could interoperate with numba. We wouldn't want to build clang as part of sasmodels, but it could be done as a separate package.
Pushing the pycparser line (we only need C99, not C++), I tried using llvmlite_generator.py to translate the AST to IR, but the AST is slightly different, and I didn't get it to work. The code is relatively small but it would have to be updated when there are changes to either the AST or the IR. I don't know how stable they are. Note that pycparser also needs a preprocessor such as pcpp for a complete system.
On a related note, the tinycc daily build might support mac arm processors but I didn't try (no Mac to test it on). There are other small c compilers but for various reasons (architecture and/or os support) I don't think any of them are viable. And anyway it doesn't solve the AV problem.
Secondary note: the AV may be simplistic. We may be able to rename the file from .dll/.so and the problem goes away.
It would still be nice to ship a small compiler for mac so that users do not have to deal with the XCode command line utils install.
several reports have now come in that at least the Trend Micro AV Software seems to flag the compiled C model code as a threat and quarantines it. This will happen while running sasview so that the model just stops working. This can be avoided by blocking the .sasview/plugin directory but that is bit of a pain.
Not sure what the answer is but suspect this may become more problematic with time? This clearly needs some investigation.