Open patrickt opened 5 years ago
seems hardcode is a good start for the users. And I assume the file that has all the hardcoded could potentially be generated by analyzing the system library using semantic itself
This is made problematic by the fact that runtime environments matter. It’s not clear, given a chunk of arbitrary JS, whether it’s intended to run in Node, or in a browser, or to be compiled to wasm.
I would argue for JS, if package.json is presented, you could always assume it's in Node environment and have access to all the node/WASM library.
it's possible that the code is actually mean to be run in browser, in which case any reference to NodeJs will cause a runtime failure or a bundling failure. But before the bundling I think it's fine they reference whatever they want
I would argue for JS, if package.json is presented, you could always assume it's in Node environment
These are the kind of assumptions we don’t want to make, I’m afraid. Semantic is not a tool for Node, it’s a tool for cross-language abstract interpretation: we shouldn’t give Node priority, especially given that the webpack
and browserify
tools allow deploying package.json
-based applications to browsers. Privileging one JS environment over others isn’t the right choice, even if it would provide better information in the short-term.
In the large, every assumption that ties us to implementation details of a language’s runtime or packaging situation is at cross purposes with our goal for Semantic, which is to write a toolkit that is powerful enough to analyze arbitrary code across a range of programming languages without reimplementing or irrevocably tying us to those languages’ canonical interpreters and runtimes. Indeed, one of the reasons we really like abstract interpretation is that we don’t have to provide all runtime primitives: abstract interpretation is capable of skipping over the constructs it doesn’t understand and still returning useful values.
Other things that make this difficult/not amenable to a quick fix:
console.log
’s declaration, when this declaration is implemented in C++? This would require us both to add C++ to Semantic (a truly massive amount of engineering work), and to write an analysis pass that is sufficiently sophisticated as to map those C++ declarations. And we would have to do this for Ruby, Python, Go, etc.As you can see, there’s a lot to think about, so we don’t want to rush into an implementation that we might later regret. But thank you for your suggestions, and your enthusiasm! We look forward to having the spare cycles to take a swing at this problem.
Just to reinforce what @patrickt’s said, the goal is for the caller to determine the assumptions they want us to analyze under: e.g. this version of that language with these dependencies. We’re not quite there yet in general, but that’s the plan.
Longer-term (e.g. in a world post-#119), I’m hoping this will mean that we’ll have different stubs (at least) of standard libraries represented as data somewhere which callers can use, if they wish. (Generating these from sources would be nice, where feasible, but I haven’t put much thought into that yet.) Likewise, I’m hoping that we’ll be able to accommodate different language versions in a single AST/compiler, as indeed the parsers are currently designed. But regardless, we are trying to bake fewer assumptions into the system as time goes on, and instead allow callers to select them for themselves.
Separately, we might provide some heuristics in some cases, like how the .rb
path extension gets mapped to Ruby; but I’m mentally planning to separate those into into the driver instead of the library wherever possible. (See also #136.)
Completely understand, another question I have and similar to this issue is how do you handle Java's dependencies, which is mostly bytecode. I'm not sure if you guys want to implement those mechnism inside semantic itself, so I think, maybe it's just simple to leave some options to the user. Say if I want to analyze ruby project depends ruby stdlib, I could provide a list of external symbols during the process, if I want to analyze Java, I could generate external symbols from bytecode and feed those information to the analyzere
@zfy0701: Broadly yep, that’s the plan. Modular handling of dependencies is essential for performance, especially at scale, and maintaining that sort of separation is key to modularity.
Depending on the details of the analysis producing the symbols, it could be mechanically tricky, but that’s how we’ve been thinking about it 👍
As reported in #164, people find the lack of standard-library awareness unintuitive and limiting. This is hard, though, as I pointed out in the issue:
Do note that just because the scope graphing mechanism is unaware of the standard library doesn’t mean that it ignores stdlib calls—they are tracked and graphed like any other call, they just lack position information. So this isn’t a “we’re missing stdlib support”, it’s a “how do we annotate stdlib calls with their position information (if any), and how do we get that position information from the stdlib?"