cwi-swat / php-analysis

PHP language analyses in Rascal
BSD 2-Clause "Simplified" License
27 stars 8 forks source link

Can't build binaries with includes information #2

Closed ckonig closed 9 years ago

ckonig commented 10 years ago

Reproduction

The signature of writeIncludeBinaries has changed, and the manual is out of date

rascal>writeIncludeBinaries();
|stdin:///|(0,22,<1,0>,<1,22>): The called signature: writeIncludeBinaries(),
does not match the declared signature:  void writeIncludeBinaries(Corpus);

Applied fix: use getLatestVersions() as parameter:

rascal>writeIncludeBinaries(getLatestVersions());

Second problem

When using getLatestVersions() the execution starts. But after a short time the execution is aborted with the following error:

|rascal://lang::php::analysis::evaluators::ScalarEval|(3533,42,<82,11>,<82,53>): The called signature: extractIncludeGraph(map[loc fileloc, Script scr], str),

does not match the declared signature:  IncludeGraph extractIncludeGraph(map[loc fileloc, Script scr], loc, set[LibItem]);

Applied an experimental fix: passed an empty set as parameter and removed .path from the loc parameter when calling extractIncludeGraph

igraph = extractIncludeGraph(scripts, baseLoc,{});

Third problem

Re-ran after applying the last fix, then encountered this problem:

|rascal://lang::php::analysis::includes::IncludeGraph|(2226,49,<61,12>,<61,61>): The called signature: calculateLoc(set[loc], loc, loc, str, bool, list[void]),

does not match the declared signature:  loc calculateLoc(set[loc], loc, loc, str, bool pathMayBeChanged=true, list[void] ipath=[]);

Applied an experimental fix: removed "true,[] “ from the call to calculateLoc because those are the default values anyway.

iloc = calculateLoc(scripts<0>,l,productRoot,sp);

Fourth problem

Re-ran after applying the last fix, then ran into this:

|rascal://lang::php::analysis::evaluators::DefinedConstants|(7363,15,<162,70>,<162,85>): Undeclared variable: getScriptConsts

Found no experimental fix yet.

mahills commented 10 years ago

If you are working on replicating the ISSTA paper (as I'm guessing you are, based on the reports you are filing) it may be best to pull and older version of the repository. We've done a fair amount of work on the includes resolution since we published the paper, and because of this the interface for some of the functions has changed. I intend to go back and fix prior experiments to ensure they still work, but haven't prioritized it.

mahills commented 10 years ago

I'll take a look at this later today, I may just back out the changes to the module being used here since we now have the includes resolution code in two other modules.

ckonig commented 10 years ago

That's exactly what I am trying to do: reproducing the results shown in the ISSTA paper. I have been looking at different versions from January to March 2013 but was not able to find a version with working writeIncludeBinaries yet.

Can you recommend a specific version?

mahills commented 10 years ago

I would look at this version: https://github.com/cwi-swat/php-analysis/commit/435e0567343c74f9bfed573de8ef1afc5b83e68a, and also at this version of the parser: https://github.com/cwi-swat/PHP-Parser/commit/a6b847143e066277da107ef4a02c5a29a06e714a. This should work -- the prior commits were specifically to make sure the experiments were reproducible, and this commit specifically was to update the readme with links to the corpus (so everything else was in place at that point). This version of the PHP parser is my last commit before then. I think the newest release would probably work as well, but this would guarantee you wouldn't have any odd interactions between the two.

basten commented 10 years ago

Hi Mark,

Could you give an indication about when you'll be able to fix this? For the M3 extraction we would also like to use the include resolution.

Thanks, Bas

mahills commented 10 years ago

It has changed quite a bit, and is actually the subject of the paper we are trying to get submitted today :) I'll post more details over the weekend.

basten commented 10 years ago

Alright, we'll wait for it. Good luck writing :)

mahills commented 10 years ago

Okay, submitted :) So, I'll try to get info updated this weekend. It isn't really "broken" (you could go back to the revision given above and it would work), we just had to make some major changes to it to make it work better.

mahills commented 10 years ago

The end of the semester was busier than I thought, sorry about that.

We have the includes resolution broken out into two distinct pieces now: a per-file resolve, and a per-program resolve. I'm assuming you will want to use the per-file resolve, since it will work at the level of individual files and give a conservative over-approximation of the files that could be imported. The per-program version is more accurate, but is also slower, and you can only really apply it to programs -- meaning you have to know the software well enough to know that you are dealing with an "entry-point" into the site, versus a file being included into other files.

So, all that said, the easiest thing to do is to look inside module lang::php::experiments::ase2014::ASE2014, specifically at function doQuickResolve. There are a couple of major things to note:

I've added the latter on an as-needed basis, and really need to factor these out and extract them automatically from the documentation available with PEAR and other PHP library systems. If you look at usedLibraries in this module you will see the libraries used by each system. If you look in module lang::php::analysis::includes::LibraryIncludes you will see standardLibraries, which gives a name for the library (e.g., PHPUnit) and a number of files that are in the library. If your system doesn't use any libraries (or, as in some cases, they are distributed inside the system itself, which seems common for systems that incorporate parts of Zend) you don't need to include this parameter.

What you get back from quickResolve is a binary relation over locations. The first location is the location of an include expression in the code, while the second is the location of the file that this location actually resolves to. Note that there can be 0, 1, or multiple files associated with a given include expression location. In some cases the file to include is missing, in which case you will see 0. In some cases it is unique, giving you 1. And, in some cases, it could be more than 1 file -- worst case, it could be any file, in which case you will get pairs of the include expression location and all other locations.

If you have a specific system you would like to run this on please let me know, I can probably write something up (e.g., a Rascal script) that will help you get started.