Octachron / codept

Contextual Ocaml DEPendencies Tool: alternative ocaml dependency analyzer
Other
59 stars 10 forks source link

how to analyze namespaced libs, like bin_prot #7

Closed mobileink closed 3 years ago

mobileink commented 3 years ago

Codept is working very well for my deps analysis, except for certain dependencies involving aliasing. bin_prot is an example. If I have stuff like open Bin_prot or module Read = Bin_prot.Read, then module Bin_prot ends up unresolved; with sexp output, (unknown ((Bin_prot)).

More detail: I'm using ocaml-uri as a test case. I construct an args file that looks like this:

-sexp
config/gen_services.ml
etc/uri_services_full.mli
etc/uri_services.mli
etc/uri_services_raw.ml
fuzz/fuzz.ml
lib/uri.ml
lib/uri.mli
lib_re/uri_legacy.mli
lib_re/uri_legacy.ml
lib_re/uri_re.ml
lib_re/uri_re.mli
lib_sexp/uri_sexp.ml
lib_sexp/uri_sexp.mli
lib_test/test_runner.ml
lib_test/test_runner_sexp.ml
-L
/Users/gar/.opam/4.09.0/lib
-L
/Users/gar/.opam/4.09.0/lib/ppx_fixed_literal
-L
... etc. one -L entry per opam directory...

Then I run codept -args <argfile> and parse the output. Works great, except for modules/packages like bin_prot. (I add stuff like open Yojson etc. to one of the source files to exercise the dependency resolution logic.)

I know codept has the notion of "file groups", which seems to be related, but from the docs I have no idea how to use it. Can you offer any guidance? I'm guessing that I need to take each 'Unknown' module, analyze the file system in order to discover its content, construct a "group" expression, and then rerun codept. Something like that? Is there a better way?

thanks,

gregg

mobileink commented 3 years ago

More info: resolution does work for pkg re, which is also namespaced. If a src file contains e.g. open Re, I get:

(lib
...
( (module (Re)) (lib ( Users gar .opam 4.09.0 lib re)) )
( (module (Re__)) (lib ( Users gar .opam 4.09.0 lib re)) )
...)

The difference between re and bin_prot is that the former contains both re.ml (which contains user-written code) and re__.ml (generated by Dune, containing only aliases), whereas the latter contains only bin_prot.ml which is generated by Dune (so no need for bin_prot__.ml).

So I'm puzzled as to why bin_prot is not resolved - the directory seems to contain everything needed to resolve it:

-rw-r--r--  1 gar staff 288K May  8 08:17 bin_prot.a
-rw-r--r--  1 gar staff 625K May  8 08:17 bin_prot.cma
-rw-r--r--  1 gar staff 1.1K May  8 08:17 bin_prot.cmi
-rw-r--r--  1 gar staff 6.5K May  8 08:17 bin_prot.cmt
-rw-r--r--  1 gar staff  351 May  8 08:17 bin_prot.cmx
-rw-r--r--  1 gar staff  18K May  8 08:17 bin_prot.cmxa
-rwxr-xr-x  1 gar staff 310K May  8 08:17 bin_prot.cmxs*
-rw-r--r--  1 gar staff  871 May  8 08:17 bin_prot.ml
... then bin_prot__Foo files, subdirs, etc. ...
Octachron commented 3 years ago

It looks like that the issue was due to silent failures in the library resolution code: codept was trying to follow module aliases inside the libraries, and failed for libraries that contained apparent cycles. The last commit on master might fix this issue. If you could check if it solves your issues that would be great!

(When feeding cmi files to codept, file groups probably doesn't matter: they are an experimental CLI interface to make it easier to define nested libraries on the command line without playing tricks with intermediary files. However, the OCaml compiler is far stricter than codept in term of cmi name and location, so that the flexibility on codept size doesn't matter for compiled file.)

mobileink commented 3 years ago

Just built it and ran it (once) and it seems to work! For bin_prot, at least, I'll do some more testing. Thanks so much.

mobileink commented 3 years ago

FYI, it finds more deps for e.g. pkg re:

( (file lib_re/uri_legacy.ml) (deps
((Stringext) (String) (Re__Posix) (Re__Core) (Re__) (Re) (Printf) (List)
(Lazy) (Format) (Char) (Bytes) (Buffer) (Array))) )
...
(lib
...
( (module (Re)) (lib ( Users gar .opam 4.09.0 lib re)) )
( (module (Re__)) (lib ( Users gar .opam 4.09.0 lib re)) )
( (module (Re__Core)) (lib ( Users gar .opam 4.09.0 lib re)) )
( (module (Re__Posix)) (lib ( Users gar .opam 4.09.0 lib re)) )
...)

Previously it was only finding Re and Re__. OTOH, uri_legacy.ml directly references Re and Re.Posix, but not Re.Core (as far as I can tell). Does that mean that codept found an indirect reference to Re.Core?

To test, I added open Re to another file (that does not use Re), and the result showed only (Re) in the list of file deps. Then I added open Re.Posix, and then the list of file deps includes (Re) (Re__) (Re__Posix).

Does that make sense?

Octachron commented 3 years ago

That is the correct behavior: module aliases allow to refer to other compilation units without depending on them. Only using module aliases should count as dependency in those cases. In your example, we have in the Re library:

(* Re.mli *)
open Re__
module Posix = Posix
(* Re__.mli *)
module Posix = Re__Posix

Thus opening Re.Posix with

(* local file *)
open Re.Posix

trigger a chain of three aliases. And if you are wondering, the fact that all consumers of wrapped library depends on the wrapper module is a known issue compared to real namespace since any module addition or deletion in the library affects all users

Octachron commented 3 years ago

And concerning the dependency on Re__core, it is an indirect dependency from the inclusion:

include Core

in re.mli . After strengthening, this defines the module Group as

module Group = Re__Core.Group

Thus accessing Re.Group induces a dependency on Re__Core

Octachron commented 3 years ago

The issue seems fixed (in master and in the released codept 0.11.0). Don't hesitate to reopen the issue if there is some remaining issues.