cisco / ChezScheme

Chez Scheme
Apache License 2.0
6.91k stars 983 forks source link

wpo library initialization error #386

Closed owaddell closed 5 years ago

owaddell commented 5 years ago

Summary

We have a problem where a clean build of the same source produces an application that may or may not fail in $invoke-library at startup. I've sketched changes to topological-sort and build-cluster* that appear to fix the problem. I could use help from @dybvig or @akeep finding the right solution.

Background

The application is a compiled program that uses two separately compiled compound libraries in addition to a few of its own libraries. We use whole-program optimization on the program and the compound libraries.

I finally reduced it to a small example (< 100 lines). Since it consists of multiple small files I've checked it in temporarily on my WIP branch for convenience.

My WIP branch adds a library-search-handler parameter (part of a pull request I'm working on), then adds the reduced example, then adds a few changes (just sketches) that together appear to fix the issue. The reduced example is in a DELETE_ME directory and includes test scripts.

In failed builds, it looks like build-cluster* puts several nodes of lib2 into the same cluster because there are no intervening binary nodes. However, some of those nodes depend on (different sets of) binary nodes from lib1, so it may not be safe to invoke the entire cluster as a unit. Fixing the sort and breaking up the clusters allows the compiled program to invoke the libraries in a safe, fine-grained manner.

FWIW, I suspect the variability from one compile to the next depends on what symbol-hash returns for various gensyms and how that affects the order of the input to topological-sort and its output.

owaddell commented 5 years ago

As another data point: we've seen this issue again on our build server, which is using the "overly pessimistic build-cluster*" change (i.e., not using any of the more experimental changes that followed).

akeep commented 5 years ago

It has taken me a little while to figure out what is going on, but I don't actually think it is problem with the sort algorithm, even though your change to the sort algorithm fixes the issue for these libraries.

I've added a bit of instrumentation to try to figure out what was going on, including printing a dot graph of the compile whole library and compile whole program output, as well as printing out when the libraries are invoked during a program run.

The script calls compile-whole-library twice and compile-whole-program once, with the following dependencies:

lib1.pdf lib2.pdf prog.pdf

Or, to put these in text, combined library 1 dependencies are:

(lib1 one) -> (lib1 two)
(lib1 combined)

; combined library 2 dependencies are:

(lib2 four) -> (lib1 two), (lib2 five), [ (lib1 one) -> (lib1 two) ]
(lib1 combined)
(lib2 combined)
(lib2 three)

; and the program helper library dependencies are:

(prog helper) -> [ (lib1 one) -> (lib1 two) ],
                 [ (lib2 four) -> (lib1 two), (lib2 five), [ (lib1 one) -> (lib1 two) ] ]
(lib1 combined)
(lib2 combined)
(lib2 three)

, where A by itself means A does not participate in any dependency relationships, A -> B means A depend upon B, A -> B, C means A depends upon B and C, and A -> [ B -> C ] means that A depends upon B which in turn depends upon C. (Probably the dot graphs are a bit more readable.)

Anyway, this means there are several sort orders that will satisfy these graphs. In the case where things go wrong, the whole library compiled combined lib2 first invoked (llb2 three) from the cluster of libraries.

It took me a bit of staring at code output and thinking about what was going on (and experimenting with the cluster stuff), but I think I've figure out what is going wrong, though not quite the best way to fix it yet. Basically, we are attempt to invoke (lib2 three), which has no dependencies. In order to get to that code in the combined library, we are evaluating (lib2 five) (which is fine since it has no dependencies) and (lib2 four) (which is is not fin since it has dependencies).

It used to be we could get away with this, because the code for (lib2 four) would have just included explicit calls to $invoke-library for its invoke dependencies, but now that we've extracted these into library/rt-info records, we've lost this. So, it could be that when we combine the libraries this way, we need to extend the dependencies on things like (lib2 three) with the dependencies for the libraries that come before it in the cluster. I believe this would fix this program, but I'm not sure if there are other problems lurking here or not.

I'm hoping I'll have some time to look at this a bit more tomorrow night and hopefully have a fix for it.

akeep commented 5 years ago

I've pushed a tentative fix on branch wpo-fix-issue-386, which updated the library-rt/info record's library requirements, so that each successive source library in the combined invoke code includes the requirements from the libraries that proceed it. This ensures that no matter which library in the group is first invoked, any external (binary) invoke requirements will also be invoked. @dyb and I are going to be code reviewing this tomorrow, and possibly refining this solution or coming up with a different one, which is why I say the fix is tentative. In writing this up, I suspect there are still some corner cases this fix does not address.

Just to capture a bit about what is going on here. I've captured some of this already in previous comments, but it is worth talking for a moment about why I don't think sorting alone can fix this.

The problem I've observed with the example program is that when the (lib2 combined) library is built (which depends upon the whole-library-compiled binary library containing (lib1 combined), (lib1 one), and (lib1 two)), the looseness of the dependency graph (see lib2.pdf), allows for some valid topological sorts that can lead the resulting library to have the potential for problems, depending on which library within the combined library is invoked first. If the libraries are sorted as: (lib2 three), (lib2 combined), (lib2 five), (lib2 four), then no matter which library we try to invoke first, it should work out okay, setting aside clustering, which I'll talk about in a moment. If the sort order ends up as (lib2 five), (lib2 four), (lib2 three), (lib2 combined) then we can still survive if (lib2 four) is invoked first, since its invoke requirements, (lib1 one) and (lib1 two), will be invoked before the combined invoke code is executed. However, invoking (lib2 three) first, will cause the (lib2 four) code, which proceeds it in the combined library, to be invoked without invoking its binary invoke requirements.

Clustering also complicates this, because it can mean that even if the library code for (lib2 three) appears before (lib2 four) in the combined invoke code, if they are in the same cluster, invoking (lib2 three) will also cause (lib2 four) to be invoked, because we always invoke the full cluster.

All of this said, this example is one in which sorting combined with disabling clustering will result in the library always being usable. However, I believe, we could construct an example where there is no valid sort order if the invoke requirements are not changed. Let's say we have two binary libraries (bin1) and (bin2) and two source libraries we want to compile into a single library (src1) and (src2). If (src1) depends upon (bin1) and (src2) depends upon (bin2) (and (src1) does not depend upon (src2)), then there is no order where the combined (src1) and (src2) code can be invoked without missing one of the two requirements.

So, what is clustering anyway? When we initially created compile-whole-program, we posited the idea that in some cases you might want to do a whole program compilation against a binary library you do not have source for. In working through how this might work, we imagined that it would even be possible to setup a dependency chain such that a binary library is between two libraries with wpo files available, so you have the dependency chain:

(wpo1)  -> (bin) -> (wpo2)

where (wpo1) depends on (bin) which depends on (wpo2). If you combined the code for (wpo1) and (wpo2) you end up with a situation where you cannot invoke (bin) without invoking the combined (wpo1) and (wpo2), since it depends on (wpo2) and you cannot invoke the combined (wpo1) and (wpo2) without invoking (bin) since the (wpo1) part fo the combined code depends on (bin). Thus, we separated this into two "clusters" one containing (wpo1) and one containing (wpo2). The (wpo1) and (wpo2) code is still combined, but if (wpo2) is invoked, it only invokes the section of the code containing the (wpo2) code and then stops, leaving the remainder of the code to be executed when (wpo1) is invoked. When (wpo1) is invoked, it will first invoke its requirement (bin). If (wpo1) is invoked first, (bin) will cause the (wpo2) section to be invoked first, and then by the time the (wpo1) code is to be invoked, it will already have been set to just invoke the remaining code.

It is worth noting that the topological sort we use is not related to anything around clustering. It is just a recognition that we do not need to use something like Tarjan's strongly connected component analysis because the dependency graph for libraries is acyclic and hence does not contain any strongly connected components. So, we just keep track of how many things depend on a library remain to be sorted, and do not allow a library to be put into the sort result until all of its dependencies have been satisfied.

owaddell commented 5 years ago

Thanks very much for investigating. The tentative fix looks interesting, but I still need to kick the tires a bit.

While you're reviewing the code, you'll want to fix the duplication of node*. I had fixed this in one of the commits on my WIP branch, but it may have been missed amid the other noise.

My suspicion of the topological sort was due to something I had seen in the instrumented output. I've added a self-contained example on my WIP branch in DELETE_ME/sort-test.ss that shows odd behavior in the original algorithm when library-entry* contains program-entry.

akeep commented 5 years ago

@owaddell, I don't think your sort-test.ss shows what you think it shows. In this example you complain because a is written out first in the topo sort we use in whole program compilation, but this is exactly what you should expect based on the node use counts you are computing as input to this function (I added a print out of these computed use counts):

% scheme DELETE_ME/sort-test.ss
Chez Scheme Version 9.5.1
Copyright 1984-2017 Cisco Systems, Inc.

trying #<procedure topological-sort at sort-test.ss:2114>
a has use count 0
b has use count 1
c has use count 3
d has use count 0
a = 0
c = 1
d = 2
b = 3
oops: a -> b, yet a = 0 and b = 3
oops: a -> c, yet a = 0 and c = 1

trying #<procedure simple-topological-sort at sort-test.ss:2840>
a has use count 0
b has use count 1
c has use count 3
d has use count 0
c = 0
b = 1
a = 2
d = 3

You'll see the computed use counts here have a with a use-count of 0, telling the topo sort that it has no dependencies.

I'm not saying we don't have a bug in the sort, I'm just saying I don't think this shows it.

akeep commented 5 years ago

I think I have a fix up now, @owaddell. If you can give it a try and tell me what you run into, I'd appreciate. It would be good to know if there is a still a bug lurking in here!

Kent and I discussed the fix yesterday, and after discussing it with him I went home to implement the change and ran into a wrinkle in the difference between compile-whole-library and compile-whole-program. We handle invoke requirements and clustering differently in these two situations.

In compile-whole-library, non-binary libraries that are sorted next to each other in the topo sort are put next to together in the same cluster, and the binary libraries are effectively ignored, since they will be invoked in the natural order by the invoke requirements listed in the library/rt-info record. The bug we ran into (I believe) is that we had a cluster with binary libraries that are effectively sorted to come before the start of a cluster, but not all of the elements in the cluster had its invoke requirements in it. The fix for this is to collect the invoke requirements for all libraries in a cluster and set this as the invoke requirements for the library/rt-info of all of the libraries in the cluster.

In compile-whole-program, no clustering takes place, and explicit $invoke-library expressions are put into the collected library and program invoke code in the order they are encountered in the topological sort.

I also took a look at your example (a -> b c), (b -> c), and (d -> c) and built a set of libraries that embody this, extended with a program that invokes (a), (b), (c), and (d) to test with compile-whole-program and (ad-lib) which re-export (a) and (d) so that we can (compile-whole-library) of this together. It isn't possible to compile (a) and (d) together otherwise, since compile-whole-library only allows us to specify one library wpo file as input and there is no relationship that includes (a) and (d) unless I construct one artificially.

Here were the results I got compile these:

[akeep@hawkeye s]% ../bin/scheme
Chez Scheme Version 9.5.1
Copyright 1984-2017 Cisco Systems, Inc.

> (current-directory "../../DELETE_ME")
> (generate-wpo-files #t)
> (compile-imported-libraries #t)
> (compile-program "abcd.ss")
compiling abcd.ss with output to abcd.so
compiling a.ss with output to a.so
compiling b.ss with output to b.so
compiling c.ss with output to c.so
compiling d.ss with output to d.so
((d) (c) (b) (a))
> (compile-whole-program "abcd.wpo" "abcd-whole.so" #t)
program-entry: #{program m7zj6rf09y313ldnbpmypw02c-0}
library-entries:
  (a) (#{a n8aqxpz0sxwi6ccdz5ekuxv1t-4}): use-count: 1
  (d) (#{d n8aqxpz0sxwi6ccdz5ekuxv1t-6}): use-count: 1
  (c) (#{c n8aqxpz0sxwi6ccdz5ekuxv1t-0}): use-count: 4
  (b) (#{b n8aqxpz0sxwi6ccdz5ekuxv1t-2}): use-count: 2
topo-sort results:
  (c) (#{c n8aqxpz0sxwi6ccdz5ekuxv1t-0})
  (b) (#{b n8aqxpz0sxwi6ccdz5ekuxv1t-2})
  (a) (#{a n8aqxpz0sxwi6ccdz5ekuxv1t-4})
  (d) (#{d n8aqxpz0sxwi6ccdz5ekuxv1t-6})
()
> (compile-library "ad-lib.ss")
compiling ad-lib.ss with output to ad-lib.so
> (compile-whole-library "ad-lib.wpo" "ad-lib-all.so")
program-entry: #f
library-entries:
  (a) (#{a n8aqxpz0sxwi6ccdz5ekuxv1t-4}): use-count: 0
  (ad) (#{ad m7zj6rf09y313ldnbpmypw02c-1}): use-count: 0
  (d) (#{d n8aqxpz0sxwi6ccdz5ekuxv1t-6}): use-count: 0
  (c) (#{c n8aqxpz0sxwi6ccdz5ekuxv1t-0}): use-count: 3
  (b) (#{b n8aqxpz0sxwi6ccdz5ekuxv1t-2}): use-count: 1
topo-sort results:
  (c) (#{c n8aqxpz0sxwi6ccdz5ekuxv1t-0})
  (b) (#{b n8aqxpz0sxwi6ccdz5ekuxv1t-2})
  (a) (#{a n8aqxpz0sxwi6ccdz5ekuxv1t-4})
  (ad) (#{ad m7zj6rf09y313ldnbpmypw02c-1})
  (d) (#{d n8aqxpz0sxwi6ccdz5ekuxv1t-6})
()
>

In both the compile-whole-program and compile-whole-library cases, (a) is sorted after (b) and (c), as expected. I'm not sure why the use-counts of 0 did not show this in your sort libraries test, but the topological sort sorts inside out, so lower numbers get laid down first with their dependencies added on as they go.

Here were the original program:

a.ss:

(library (a)
  (export a-x)
  (import (rnrs) (b) (c))
  (define a-x (cons* 'a b-x c-x)))

b.ss:

(library (b)
  (export b-x)
  (import (rnrs) (c))
  (define b-x (cons 'b c-x)))

c.ss:

(library (c)
  (export c-x)
  (import (rnrs))
  (define c-x (cons 'c 'c)))

d.ss:

(library (d)
  (export d-x)
  (import (rnrs) (c))
  (define d-x (cons 'd c-x)))

ad-lib.ss:

(library (ad)
  (export a-x d-x)
  (import (a) (d)))

abcd.ss:

(import (chezscheme) (a) (b) (c) (d))
(printf "a-x: ~s~%" a-x)
(printf "b-x: ~s~%" b-x)
(printf "c-x: ~s~%" c-x)
(printf "d-x: ~s~%" d-x)

I instrumented the topological-sort function as follows:

(define topological-sort
  (lambda (program-entry library-entry*)
    (printf "program-entry: ~s~%" (and program-entry (program-info-uid (program-node-pinfo program-entry))))
    (printf "library-entries:~%")
    (for-each
      (lambda (node)
        (printf "  ~s (~s): use-count: ~s~%" (library-node-path node)
          (library-node-uid node) (node-use-count node)))
      library-entry*)
    (let ()
      (define topological-sort
        (lambda (dep* node*)
          (if (null? dep*)
              node*
              (let* ([dep (car dep*)] [use-count (node-use-count dep)])
                (node-use-count-set! dep (fx- use-count 1))
                (if (fx= use-count 1)
                    (topological-sort (cdr dep*) (topological-sort (node-depend* dep) (cons dep node*)))
                    (topological-sort (cdr dep*) node*))))))
      (let ([results (fold-right
                       (lambda (entry node*) (topological-sort (node-depend* entry) (cons entry node*)))
                       (if program-entry (topological-sort (node-depend* program-entry) '()) '())
                       (filter (lambda (node) (fx= (node-use-count node) 0)) library-entry*))])
        (printf "topo-sort results:~%")
        (for-each
          (lambda (node)
            (if (library-node? node)
                (printf "  ~s (~s)~%" (library-node-path node) (library-node-uid node))
                (printf "  ~s~%" (program-info-uid (program-node-pinfo program-entry)))))
          results)
        results))))

If you'd like to give it a try and tell me if you run into problems in your testing. We could also add an assert in here for testing, to make multiple test runs easier to check out.

Just for completeness, here is how I generated the dot for the compile-whole-library and compile-whole-program entries. I added the function print-dot to compile.ss:

(define (print-dot who lib*)
  (let ([ht (make-eq-hashtable)]
        [n 0]
        [lib* (sort
                (lambda (lib1 lib2)
                  (let loop ([lib1 (library-node-path lib1)] [lib2 (library-node-path lib2)])
                    (cond
                      [(null? lib1) #t]
                      [(null? lib2) #f]
                      [(eq? (car lib1) (car lib2)) (loop (cdr lib1) (cdr lib2))]
                      [(string<? (symbol->string (car lib1)) (symbol->string (car lib2)))]
                      [else #f])))
                lib*)])
    (define (next-name)
      (let ([x (format "n~s" n)])
        (set! n (fx+ n 1))
        x))
    (printf "~%digraph \"~s\" {~%" who)
    (for-each
      (lambda (lib)
        (let ([n (next-name)])
          (eq-hashtable-set! ht lib n)
          (printf "  ~a [label=\"~s\", bgcolor=\"~a\"];~%"
                  n (library-node-path lib)
                  (if (library-node-binary? lib) "#CCCCCC" "#FFFFFF"))))
      lib*)
    (for-each
      (lambda (lib)
        (let ([n (eq-hashtable-ref ht lib #f)])
          (for-each
            (lambda (dep)
              (printf "  ~a -> ~a [color=\"black\"];~%" n (eq-hashtable-ref ht dep #f)))
            (node-depend* lib))))
      lib*)
    (printf "}~%")))

And then called it in compile-whole-program and compile-whole-library:

  ;; TODO: Add automatic recompliation ala scheme import/load-library
  (set-who! compile-whole-program
    (rec compile-whole-program
      (case-lambda
        [(ifn ofn) (compile-whole-program ifn ofn #f)]
        [(ifn ofn libs-visible?)
         (unless (string? ifn) ($oops who "~s is not a string" ifn))
         (unless (string? ofn) ($oops who "~s is not a string" ofn))
         (let*-values ([(hash-bang-line ir*) (read-input-file who ifn)]
                       [(program-entry lib* invisible* no-wpo*) (build-graph who ir* ifn #t #f libs-visible?)])
           (print-dot who (append lib* invisible*)) ;; !!! HERE print the graph we got
           (safe-assert program-entry)
           (safe-assert (null? no-wpo*))
           (let ([node* (topological-sort program-entry lib*)])
             (printf "compile-whole-program nodes: ~{~s, ~}~%" (map library-node-path node*))
             (finish-compile who "whole program" ifn ofn hash-bang-line
               (build-program-body program-entry node* lib* invisible*))
             (build-required-library-list node* lib*)))])))

  (set-who! compile-whole-library
    (lambda (ifn ofn)
      (unless (string? ifn) ($oops who "~s is not a string" ifn))
      (unless (string? ofn) ($oops who "~s is not a string" ofn))
      (let*-values ([(hash-bang-line ir*) (read-input-file who ifn)]
                    [(no-program lib* invisible* wpo*) (build-graph who ir* ifn #f (generate-wpo-files) #t)])
        (print-dot who lib*) ;; !!! HERE print the graph we got
        (safe-assert (not no-program))
        (safe-assert (null? invisible*))
        (safe-assert (or (not (generate-wpo-files)) (not (null? wpo*))))
        (when (null? lib*) ($oops "did not find libraries in input file ~s" ifn))
        (let ([node* (topological-sort #f lib*)])
          (printf "compile-whole-library nodes: ~{~s, ~}~%" (map library-node-path node*))
          (write-wpo-file who ofn wpo*)
          (finish-compile who "whole library" ifn ofn hash-bang-line
            (build-library-body node* lib*))
          (build-required-library-list node* lib*))))))

Okay, off to my day job---I'll check back in on this, this evening.

owaddell commented 5 years ago

Thank you very much for the promising fix and the detailed write-ups along the way. It's good to have some explanation of what clustering is all about. Sorry for the sorting distractions.

I'm still running tests and will update you this weekend.

owaddell commented 5 years ago

I've run into a snag in my testing. I pushed a branch that adds a stab at a test-case generator. The test generator surely needs more debugging itself, but the following test case seems to cause a hiccup. I haven't looked into the cause, but added a printf where compile.ss appears unhappy.

$ ./configure --workarea=my-workarea
$ (cd my-workarea; make)
$ cd DELETE_ME
$ scheme gen.ss
> (use "../my-workarea")
> (run-trial 10 17592293441180 76)   ;; run a particular test
...
--- build 1
compile-library G
compile-whole-library G
visit G.library
compile-library D
compile-whole-library D
visit D.library
visit G.library
compile-library C
compile-whole-library C
FAIL no node in libs for G.i91420dyw1zww1z45f2cc8o5a-0
Exception: invalid memory reference.  Some debugging context lost

Here is the dot graph of library dependencies that this particular example attempts to construct, again the test generator may be flawed. Blue nodes are binary-only during wpo optimization.

akeep commented 5 years ago

I'm having a look at it---it looks like I screwed something up in my "fix," I'm having trouble getting your test to work, which I could have sworn worked before. It is failing when it tries to compile the program for me, with the same error you are running into.

Sorry about that, hopefully I'll have a fix for you soon.

akeep commented 5 years ago

Okay, I think I know what is going on. When I extended the invoke requirements, I did not extend the import requirements, however, we expect the import requirements to be a superset of the invoke requirements during the compile-whole-program and compile-whole-library.

I should have a fix for things posted shortly.

I'm testing with your gen.ss script. I needed to add some 'replace arguments to the with-output-to-file calls, other than that, things worked out well as long as I kept n under 25. At 30 it started generating file names that broke dot and scheme was unhappy loading them (names like: ].ss). I did try to do a bit of random testing with a lower value for n though.

akeep commented 5 years ago

Fix posted in my wpo-fix-issue-386 branch. Sorry for the churn on this, hopefully this is working now.

I'd be interested in trying to put together mats testing for this issue, and I'm open to suggestions. I can pull the freq test in, but it would be nice to get a consistent failure if we could generate one.

Let me know if you continue to see problems, if not I'll clean up my log messages, add some sort of mat and merge this into master.

owaddell commented 5 years ago

Thanks for the quick fix!

It took the better part of the day, but I did run 500 iterations of your f0b4dd453fc1f745660f2b977dd9c0e8d30a8411 fix building the original versions of our internal projects that had run into trouble originally. Your fix worked fine and I also verified that I could reproduce the issue if I went back to a revision without your fixes.

I also updated my gen.ss tester to add an entry point that lets us specify a specific graph to try. In particular, I wanted to try the obvious case of a graph like the one from the freq test. That turned up a couple issues in my test generator. First, it was letting things boil away when the generated library had no exports required by other libraries, so I added $init to the list. Second, it appears that the length of identifier names matters, so I changed the (node i) function. With these changes, it successfully generates cases that fail in the code from before your fixes. I left some experiments in after the #!eof.

I'll build the latest and run it a bit and see if it turns up anything.

owaddell commented 5 years ago

I fixed a silly error in the still-dubious test generator, so I'm not sure what to make of the (generated) tests run yesterday. A fair sampling of the space of 5-library programs with different combinations of binary libraries didn't turn up any problems, but that version gen.ss wasn't visiting / loading binary libraries in the right order.

On the positive side, it seems that we need just one iteration to show issues in the original code (before any of your fixes). Perhaps a mat is within reach.

$ scheme gen.ss
> (use "../a6le-orig-bug")
> *repeat-compile*
1
> (run-trial 5 5 10)   ;; expected vs. actual differ in old system
> (run-trial 5 973 10) ;; expected vs. actual differ in old system

On the confusing (more likely confused) side, I have seen two types of strangeness with the latest 11d1030d3b34c2054848d0c1ba7f3e0a3b30f713 code.

Here is a reduced example where the program compiles and runs under a particular wpo regimen, but produces results that differ from the expected output. The $ function now echos the command before calling system, so transcripts are a bit more detailed.

$ scheme gen.ss
> (use "../akeep-11d1030-dot")
> (run-trial 10 22001860757166 216) ;; output differs
> (define g
    '((ice -> jicama) (eel -> ice) (harissa ->) (coconut -> fennel)
      (banana -> coconut) (apple -> ice) (apple -> eel)
      (apple -> date) (apple -> coconut)))
> (run-graph g 216)
[...]
$ ${SCHEME} --program main.ss > expected.output
--- build 1
$ ${SCHEME} --script ../build-wpo.ss library fennel
compile-library fennel
compile-whole-library fennel
$ ${SCHEME} --script ../build-wpo.ss library jicama fennel
visit fennel.library
compile-library jicama
compile-whole-library jicama
$ ${SCHEME} --script ../build-wpo.ss library date fennel jicama
visit fennel.library
visit jicama.library
compile-library date
compile-whole-library date
$ ${SCHEME} --script ../build-wpo.ss library apple fennel jicama date
visit fennel.library
visit jicama.library
visit date.library
compile-library apple
compile-whole-library apple
$ ${SCHEME} --script ../build-wpo.ss program main fennel jicama date apple
visit fennel.library
visit jicama.library
visit date.library
visit apple.library
compile-program main
compile-whole-program main
$ cat fennel.library jicama.library date.library apple.library main.program > full.program
$ ${SCHEME} --import-notify --program full.program > actual.output
$ diff -q expected.output actual.output
Files expected.output and actual.output differ

Here is a reduced example that runs into trouble while trying to compile the program against wpo libraries.

$ scheme gen.ss
> (use "../akeep-11d1030-dot")
> (define g
    '((gorgonzola -> harissa)
      (eel -> harissa)
      (coconut -> gorgonzola)
      (banana -> eel)
      (banana -> date)
      (banana -> coconut)))
> (run-graph g 6)
[...]
$ ${SCHEME} --program main.ss > expected.output
--- build 1
$ ${SCHEME} --script ../build-wpo.ss library coconut
compile-library coconut
compile-whole-library coconut
$ ${SCHEME} --script ../build-wpo.ss library banana coconut
visit coconut.library
compile-library banana
compile-whole-library banana
$ ${SCHEME} --script ../build-wpo.ss program main coconut banana
visit coconut.library
visit banana.library
compile-program main
Exception: cyclic dependency involving import of library (coconut)

Again, the test generator is still suspect, so I'm not sure how to interpret the results, but will put this out there in case there is anything to it. Granted, some libraries may be duplicated by whole-program optimization; that's what the varying choice of binary library is meant to exercise. We have to think carefully about these choices in real programs, but we should be okay in the generated test programs, I think, though at the moment to "think" seems a quite unattainable goal.

owaddell commented 5 years ago

The internal project also complains about "cyclic dependency involving import" with the 11d1030d3b34c2054848d0c1ba7f3e0a3b30f713 change.

akeep commented 5 years ago

Okay, I'll take a look at the gen.ss test graph that was leading to a cyclic dependency being generated in the combined code. The intention of the clustering is to avoid exactly this, while still combining libraries as much as we can, but ever since we started looking at this, I've been a bit concerned that we are not treating the combined binary libraries as a single library, instead treating them as though they were separate parts in downstream compile-whole-program and compile-whole-library calls.

akeep commented 5 years ago

So, in the reduced (run-graph g 6) example above, the coconut.library is built with libraries harissa.wpo, gorganzola.wpo, and coconut.wpo. When it is building banana.library, it is getting two copies of harissa.wpo because it finds both the harissa.wpo in and the binary one in coconut.library, which is leading to it doing some strange things.

I'm not sure if that is what is happening with your internal project, but this is what is leading to the problem in the program and libraries generated by (run-graph g 6).

Essentially, what is going on here is that

  1. coconut.library is created with (harissa), (gorgonzola), and (coconut) combined into a single unit, which no longer has any invoke requirements, since all of the requirements are satisfied by the grouping, and the import requirements of all three libraries are left alone.
  2. banana.library is created with (harissa) (again), (eel), (date), and (banana) combined into a single unit, with the invoke requirement (coconut). Unfortunately, the compile-whole-library no longer knows that (harissa) must come before (coconut), because when we combined (harissa), (gorgonzola), and (coconut), we lost that information, and the import requirements of all three libraries are extended with (coconut)

This means that when we try to load either of these two combined libraries, we end up tied in knots because we end up registering (harissa) in both libraries, once with backwards import and invoke dependencies, so (coconut) tries to load (harissa) as part of its import requirements and then (harissa) tries to load (coconut) as part of its import dependencies, and we get the error.

The root problem is that we saw (harissa) twice, once from harissa.wpo and once from the coconut.library. So, one work around is to always take the binary library when you see the library twice (effectively overwrite the non-binary) under the assumption that the binary version must be part of some combined library.

This works for this case, but is a bit unsatisfactory. One might argue that if build-graph encounters the same library twice it is an error---especially since I think it is possible to create a wpo file from the combined library.

Another option is to keep the original invoke requirements, in addition to the the collected invoke requirements from the cluster. This avoids the cyclic dependency, and in this particular example works, however, it also means that you'll end up with two copies of the (harissa) code, one in the coconut.library, and one in the banana.library.

This is also pretty unsatisfactory. In cases where the duplicated library has some stateful elements, these elements will be re-initialized when the second cluster runs, wiping out any state that was collected up to that point, and leading to different results in the combined code.

My inclination is to change the code to raise an error when the second copy of a library is loaded, reporting the source of the original library as well as where the second one was found, so that the user can resolve the duplication, possibly through changing the search path, or using something like your library-search-handler to exclude libraries that would lead to the duplication.

One way to test this out would be to change the code in record-ct-lib! and record-rt-lib! so that instead of:

(begin (unless (library-node-ctinfo node) (library-node-ctinfo-set! node linfo/ct)) #f)

and

(begin (unless (library-node-rtinfo node) (library-node-rtinfo-set! node linfo/rt)) #f)

Have them do:

(if (library-node-ctinfo node)
    ($oops who "encountered library ~s in ~a, but had already encountered it"
       (library-info-path linfo/ct) ifn)
    (begin (library-node-ctinfo-set! node linfo/ct) #f))

and

(if (library-node-rtinfo node)
    ($oops who "encountered library ~s in ~a, but had already encountered it"
       (library-info-path linfo/rt) ifn)
    (begin (library-node-rtinfo-set! node linfo/rt) #f))

This will raise the error and report the second instance encountered. We don't currently store the input file name in the library-node record, but we could add the filename to this record to report where it was originally encountered.

akeep commented 5 years ago

With help off line from @owaddell, commit 09bba00 addresses this bug.