facebook / infer

A static analyzer for Java, C, C++, and Objective-C
http://fbinfer.com/
MIT License
14.89k stars 2.01k forks source link

Match changed file name as suffix of source file name #1684

Closed xi-liu-ds closed 1 year ago

xi-liu-ds commented 1 year ago

In json capture mode, we found that in some cases, especially when the dll binary files were generated by Visual Studio's built-in compiler, the source file paths in the cfg.json are absolute paths, such as C:/Users/abc/source/repos/ClassLibrary1/ClassLibrary1/Class1.cs. However, the changed file paths exported by git diff are relative paths, such as ClassLibrary1/ClassLibrary1/Class1.cs. To close this gap, we propose a mode to match changed files as the suffix of the full paths in cfg.json.

xi-liu-ds commented 1 year ago

I think it would be better to fix the underlying problem that you get absolute paths for files within the project root in the first place but we could still merge this if that's not possible. When you get C:/Users/abc/source/repos/ClassLibrary1/ClassLibrary1/Class1.cs in the cfgs.json, if you set the project root to C:/Users/abc/source/repos/ I would expect the captured data to show the file as ClassLibrary1/ClassLibrary1/Class1.cs.

Is the project root set correctly when running capture? If so, does infer debug --source-files show the absolute or relative path after capture? If it's an absolute path at that point I think the issue may be within the json capture, it should try to make the source files relative to the project root when importing them.

The GitHub CI is failing because this needs a run of make fmt. When adding new options it's good to run make manuals too (otherwise I have to do it ;) ).

Thanks for your advice! I got the following error when I run infer debug --source-files after infer capture:

Usage Error: When parsing option --pulse-taint-sources: Unexpected JSON format: Exactly one of "procedure", "procedure_regex" must be provided, or else "class_names" and "method_names" must be provided, or else "overrides_of_class_with_annotation", but got "procedure": [None], "procedure_regex": [None], "class_names": [None], "method_names": [None], "overrides_of_class_with_annotation": [None]

Do you know if I should define --pulse-taint-sources or simply disable it?

xi-liu-ds commented 1 year ago

I think it would be better to fix the underlying problem that you get absolute paths for files within the project root in the first place but we could still merge this if that's not possible. When you get C:/Users/abc/source/repos/ClassLibrary1/ClassLibrary1/Class1.cs in the cfgs.json, if you set the project root to C:/Users/abc/source/repos/ I would expect the captured data to show the file as ClassLibrary1/ClassLibrary1/Class1.cs.

Is the project root set correctly when running capture? If so, does infer debug --source-files show the absolute or relative path after capture? If it's an absolute path at that point I think the issue may be within the json capture, it should try to make the source files relative to the project root when importing them.

The GitHub CI is failing because this needs a run of make fmt. When adding new options it's good to run make manuals too (otherwise I have to do it ;) ).

As I mentioned, this happened only when the dlls were compiled by Visual Studio's built-in compiler. And when I print the relative path of source file from cfg by adding the following code:

image

I still got the full path:

image

We double checked and found that it is not possible to extract relative path from VS compiled dlls.

xi-liu-ds commented 1 year ago

Somehow I can't run make fmt_all or make fmt. I got following error message:

image

I did run make devsetup OPAM_PIN_OCAMLFORMAT=yes first. But I had to remove --no-depext flag otherwise it gave the following error

image
jvillard commented 1 year ago

Thanks for your advice! I got the following error when I run infer debug --source-files after infer capture:

Usage Error: When parsing option --pulse-taint-sources: Unexpected JSON format: Exactly one of "procedure", "procedure_regex" must be provided, or else "class_names" and "method_names" must be provided, or else "overrides_of_class_with_annotation", but got "procedure": [None], "procedure_regex": [None], "class_names": [None], "method_names": [None], "overrides_of_class_with_annotation": [None]

Do you know if I should define --pulse-taint-sources or simply disable it?

I suspect you have a .inferconfig lying around that defines "pulse-taint-sources" in an unsupported way? I cannot reproduce your error.

jvillard commented 1 year ago

As I mentioned, this happened only when the dlls were compiled by Visual Studio's built-in compiler. And when I print the relative path of source file from cfg by adding the following code: image I still got the full path: image

We double checked and found that it is not possible to extract relative path from VS compiled dlls.

Once the source file has been recorded with an absolute path we won't try to turn it back into a relative path, even with to_rel_path:

https://github.com/facebook/infer/blob/5e9dde409e64da7584d06ee7664468c3e4ba13f2/infer/src/base/SourceFile.ml#L221-L222

At the time of capture, if the project root is set correctly then creating the SourceFile.t (which you can only do with SourceFile.from_abs_path or SourceFile.create) should make it relative, which is why I was curious how this situation is created in the first place. What commands do you run to capture these dlls compiled by Visual Studio and from where?

jvillard commented 1 year ago

Somehow I can't run make fmt_all or make fmt. I got following error message:

image

That means that git doesn't detect any changed files compared to the origin, are you working from something that's not a git repo perhaps? Does make fmt_all work for you?

I did run make devsetup OPAM_PIN_OCAMLFORMAT=yes first. But I had to remove --no-depext flag otherwise it gave the following error image

Oops, the option is called --no-depexts and apparently opam is normally clever enough to accept --no-depext (or even just --no-d!!) instead. We probably have different versions of opam that treat this differently. Will fix, thanks!

xi-liu-ds commented 1 year ago

As I mentioned, this happened only when the dlls were compiled by Visual Studio's built-in compiler. And when I print the relative path of source file from cfg by adding the following code: image I still got the full path: image We double checked and found that it is not possible to extract relative path from VS compiled dlls.

Once the source file has been recorded with an absolute path we won't try to turn it back into a relative path, even with to_rel_path:

https://github.com/facebook/infer/blob/5e9dde409e64da7584d06ee7664468c3e4ba13f2/infer/src/base/SourceFile.ml#L221-L222

At the time of capture, if the project root is set correctly then creating the SourceFile.t (which you can only do with SourceFile.from_abs_path or SourceFile.create) should make it relative, which is why I was curious how this situation is created in the first place. What commands do you run to capture these dlls compiled by Visual Studio and from where?

infer debug --source-files returns absolute paths, like what is shown below:

image

The capture command is pretty normal:

infer capture --cfg-json infer-staging/cfg.json --tenv-json infer-staging/tenv.json

A possible reason I can think of is: since those dlls were compiled in my colleague machine, no project root can be correctly retrieved when I run capture in my machine?

xi-liu-ds commented 1 year ago

As I mentioned, this happened only when the dlls were compiled by Visual Studio's built-in compiler. And when I print the relative path of source file from cfg by adding the following code: image I still got the full path: image We double checked and found that it is not possible to extract relative path from VS compiled dlls.

Once the source file has been recorded with an absolute path we won't try to turn it back into a relative path, even with to_rel_path: https://github.com/facebook/infer/blob/5e9dde409e64da7584d06ee7664468c3e4ba13f2/infer/src/base/SourceFile.ml#L221-L222

At the time of capture, if the project root is set correctly then creating the SourceFile.t (which you can only do with SourceFile.from_abs_path or SourceFile.create) should make it relative, which is why I was curious how this situation is created in the first place. What commands do you run to capture these dlls compiled by Visual Studio and from where?

infer debug --source-files returns absolute paths, like what is shown below: image

The capture command is pretty normal:

infer capture --cfg-json infer-staging/cfg.json --tenv-json infer-staging/tenv.json

A possible reason I can think of is: since those dlls were compiled in my colleague machine, no project root can be correctly retrieved when I run capture in my machine?

We just made a test and find that even if we compile and run infer on the same machine, the problem still persists as long as the dlls were compiled by VS.

xi-liu-ds commented 1 year ago

I added a few logs here to see what went wrong

image

And I found that infer actually interprets the absolute path like "C:/Users/matjin/source/repos/ClassLibrary1/ClassLibrary1/Class1.cs" as relative path

image
xi-liu-ds commented 1 year ago

I added a few logs here to see what went wrong

image

And I found that infer actually interprets the absolute path like "C:/Users/matjin/source/repos/ClassLibrary1/ClassLibrary1/Class1.cs" as relative path

image

I had another test. Seems that this problem applies to all dlls built in Windows platform. Such as built in VS or by command dotnet build. Not sure if it is a bug?

xi-liu-ds commented 1 year ago

Hi @jvillard ,

In summary, here are my findings: Filename.is_relative failed to interpret Windows path, such as "C:/Users/matjin/source/repos/ClassLibrary1/ClassLibrary1/Class1.cs", as absolute path, and Infer can't convert Windows absolute path to relative path. Both seem to be OCaml issues because OCaml natively doesn't support Windows path. This PR can serve as a short term mitigation. In long term, we can try to fix it at OCaml side.

facebook-github-bot commented 1 year ago

@jvillard has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.