Inference errors or fails for 105 of the 142 UAI 2014 Inference Competition problems

mroavi commented 1 year ago

The UAI 2014 benchmark consists of the following problem sets:

Alchemy
CSP
DBN
Grids
linkage
ObjectDetection
Pedigree
Promedus
relational
Segmentation

In total, there are 142 problems. Of the 142 tests, only 37 are passing. In summary, the Promedus and Segmentation tests all pass, the Grids tests pass partially, and the tests for the rest of the problem sets fail. See the test results below for more details.

Test Summary:                           | Pass  Fail  Error  Total     Time
inference                               |   37     6     99    142  3m44.4s
  gradient-based tensor network solvers |   37     6     99    142  3m44.2s
    UAI 2014 problems                   |   37     6     99    142  3m44.2s
      Alchemy benchmark                 |          1             1    31.6s
        Alchemy_11                      |          1             1    31.1s
      CSP benchmark                     |                 3      3     1.6s
        CSP_11                          |                 1      1     1.3s
        CSP_12                          |                 1      1     0.2s
        CSP_13                          |                 1      1     0.2s
      DBN benchmark                     |                 6      6  1m27.5s
        DBN_11                          |                 1      1    20.5s
        DBN_12                          |                 1      1     5.7s
        DBN_13                          |                 1      1    19.8s
        DBN_14                          |                 1      1    20.8s
        DBN_15                          |                 1      1     3.9s
        DBN_16                          |                 1      1    16.8s
      Grids benchmark                   |    3     5             8    27.1s
        Grids_11                        |    1                   1     5.5s
        Grids_12                        |    1                   1     2.3s
        Grids_13                        |          1             1     0.5s
        Grids_14                        |          1             1     0.5s
        Grids_15                        |    1                   1    11.1s
        Grids_16                        |          1             1     2.4s
        Grids_17                        |          1             1     2.4s
        Grids_18                        |          1             1     2.4s
      linkage benchmark                 |                17     17     9.0s
        linkage_11                      |                 1      1     1.1s
        linkage_12                      |                 1      1     0.8s
        linkage_13                      |                 1      1     0.6s
        linkage_14                      |                 1      1     0.2s
        linkage_15                      |                 1      1     0.6s
        linkage_16                      |                 1      1     0.4s
        linkage_17                      |                 1      1     0.5s
        linkage_18                      |                 1      1     0.5s
        linkage_19                      |                 1      1     0.3s
        linkage_20                      |                 1      1     0.6s
        linkage_21                      |                 1      1     0.2s
        linkage_22                      |                 1      1     0.3s
        linkage_23                      |                 1      1     0.5s
        linkage_24                      |                 1      1     0.7s
        linkage_25                      |                 1      1     0.5s
        linkage_26                      |                 1      1     0.7s
        linkage_27                      |                 1      1     0.4s
      ObjectDetection benchmark         |                65     65    11.4s
        ObjectDetection_11              |                 1      1     0.1s
        ObjectDetection_12              |                 1      1     0.2s
        ObjectDetection_13              |                 1      1     0.2s
        ObjectDetection_14              |                 1      1     0.2s
        ObjectDetection_15              |                 1      1     0.2s
        ObjectDetection_16              |                 1      1     0.2s
        ObjectDetection_17              |                 1      1     0.2s
        ObjectDetection_18              |                 1      1     0.2s                                                                                                                               [18/8168]
        ObjectDetection_19              |                 1      1     0.2s
        ObjectDetection_20              |                 1      1     0.3s
        ObjectDetection_21              |                 1      1     0.2s
        ObjectDetection_22              |                 1      1     0.1s
        ObjectDetection_23              |                 1      1     0.1s
        ObjectDetection_24              |                 1      1     0.1s
        ObjectDetection_25              |                 1      1     0.2s
        ObjectDetection_26              |                 1      1     0.1s
        ObjectDetection_27              |                 1      1     0.1s
        ObjectDetection_28              |                 1      1     0.1s
        ObjectDetection_29              |                 1      1     0.2s
        ObjectDetection_30              |                 1      1     0.1s
        ObjectDetection_31              |                 1      1     0.1s
        ObjectDetection_32              |                 1      1     0.1s
        ObjectDetection_33              |                 1      1     0.1s
        ObjectDetection_34              |                 1      1     0.1s
        ObjectDetection_35              |                 1      1     0.1s
        ObjectDetection_36              |                 1      1     0.1s
        ObjectDetection_37              |                 1      1     0.1s
        ObjectDetection_38              |                 1      1     0.1s
        ObjectDetection_39              |                 1      1     0.1s
        ObjectDetection_40              |                 1      1     0.1s
        ObjectDetection_41              |                 1      1     0.1s
        ObjectDetection_42              |                 1      1     0.2s
        ObjectDetection_43              |                 1      1     0.1s
        ObjectDetection_44              |                 1      1     0.2s
        ObjectDetection_45              |                 1      1     0.3s
        ObjectDetection_46              |                 1      1     0.2s
        ObjectDetection_47              |                 1      1     0.3s
        ObjectDetection_48              |                 1      1     0.3s
        ObjectDetection_49              |                 1      1     0.2s
        ObjectDetection_50              |                 1      1     0.2s
        ObjectDetection_51              |                 1      1     0.2s
        ObjectDetection_52              |                 1      1     0.2s
        ObjectDetection_53              |                 1      1     0.2s
        ObjectDetection_54              |                 1      1     0.2s
        ObjectDetection_55              |                 1      1     0.2s
        ObjectDetection_56              |                 1      1     0.2s
        ObjectDetection_57              |                 1      1     0.2s
        ObjectDetection_58              |                 1      1     0.2s
        ObjectDetection_59              |                 1      1     0.3s
        ObjectDetection_60              |                 1      1     0.3s
        ObjectDetection_61              |                 1      1     0.3s
        ObjectDetection_62              |                 1      1     0.2s
        ObjectDetection_63              |                 1      1     0.1s
        ObjectDetection_64              |                 1      1     0.2s
        ObjectDetection_65              |                 1      1     0.3s
        ObjectDetection_66              |                 1      1     0.2s
        ObjectDetection_67              |                 1      1     0.1s
        ObjectDetection_68              |                 1      1     0.2s
        ObjectDetection_69              |                 1      1     0.2s
        ObjectDetection_70              |                 1      1     0.2s
        ObjectDetection_71              |                 1      1     0.1s
        ObjectDetection_72              |                 1      1     0.2s
        ObjectDetection_73              |                 1      1     0.2s
        ObjectDetection_74              |                 1      1     0.1s
        ObjectDetection_75              |                 1      1     0.1s
      Pedigree benchmark                |                 3      3     0.7s
        Pedigree_11                     |                 1      1     0.2s
        Pedigree_12                     |                 1      1     0.2s
        Pedigree_13                     |                 1      1     0.2s
      Promedus benchmark                |   28                  28    42.8s
      relational benchmark              |                 5      5     2.5s
        relational_1                    |                 1      1     1.0s
        relational_2                    |                 1      1     0.3s
        relational_3                    |                 1      1     0.2s
        relational_4                    |                 1      1     0.3s
        relational_5                    |                 1      1     0.7s
      Segmentation benchmark            |    6                   6    10.1s

GiggleLiu commented 1 year ago

There is indead a bug about tensor shape in the program, however fixing it only fixes some of the errors.
For some instances, the probabilities are not normalized, the double precision number overflows. Do you have any idea about how to fix the error?
The treewidths of DBN, Pedigree and relational are too large, the memory overflows. Or is it possible the contraction order optimizer in OMEinsum does not find a treewidth that close to the optimum? Wondering if there are some reference treewidth.

mroavi commented 1 year ago

With respect to 2:

The double precision is underflowing since we are multiplying many small numbers together. We could try renormalizing the factor after each operation (instead of doing it one time in the end like we are doing it now). Another solution is to perform the computation in log-space, replacing multiplications with additions. The problem with this is that the marginalization step cannot be performed in log-space. This means that we would need to exponentiate each entry in the factor, perform the marginalization, and then take the log of the result, which could be very costly. So I would go for the first option.

With respect to 3:

Let's compare the resulting treewidths obtained with OMEinsum and Tamaki's algorithm for each of the 144 problems.

mroavi commented 1 year ago

Here is the tree decomposition info as generated by Tamaki's tool.


using JunctionTrees: get_td_soln
using Artifacts, PrettyTables

td_filepaths = filter(x -> endswith(x, ".td"), readdir(artifact"uai2014"; sort=false, join=true))
td_soln_args = get_td_soln.(td_filepaths)

# Get the problem names
rexp = Regex(".*/(.*)\\.tamaki.td\$") 
problem_names = td_filepaths |> 
  x -> map(y -> match(rexp, y), x) |> # apply regex
  x -> filter(!isnothing, x) |> # filter out `nothing` values
  x -> map(first, x) # get the first capture of each element

table = hcat(problem_names, reduce(hcat, td_soln_args)')

pretty_table(
             table,
             header = ["Problem name", "Number of bags", "Largest bag size", "Number of variables"],
             crop = :none,
            )

┌────────────────────┬────────────────┬──────────────────┬─────────────────────┐
│       Problem name │ Number of bags │ Largest bag size │ Number of variables │
├────────────────────┼────────────────┼──────────────────┼─────────────────────┤
│         Alchemy_11 │            421 │               20 │                 440 │
│             CSP_11 │             51 │               15 │                  82 │
│             CSP_12 │             38 │               12 │                  67 │
│             CSP_13 │             47 │               20 │                 100 │
│             DBN_11 │             21 │               21 │                  40 │
│             DBN_12 │             22 │               22 │                  42 │
│             DBN_13 │             23 │               23 │                  44 │
│             DBN_14 │             21 │               21 │                  40 │
│             DBN_15 │             22 │               22 │                  42 │
│             DBN_16 │             23 │               23 │                  44 │
│           Grids_11 │             77 │               20 │                 100 │
│           Grids_12 │             89 │               11 │                 100 │
│           Grids_13 │             77 │               20 │                 100 │
│           Grids_14 │             77 │               20 │                 100 │
│           Grids_15 │            346 │               21 │                 400 │
│           Grids_16 │            346 │               21 │                 400 │
│           Grids_17 │            346 │               21 │                 400 │
│           Grids_18 │            346 │               21 │                 400 │
│ ObjectDetection_11 │             38 │                7 │                  60 │
│ ObjectDetection_12 │             42 │                6 │                  60 │
│ ObjectDetection_13 │             38 │                7 │                  60 │
│ ObjectDetection_14 │             42 │                6 │                  60 │
│ ObjectDetection_15 │             42 │                6 │                  60 │
│ ObjectDetection_16 │             38 │                7 │                  60 │
│ ObjectDetection_17 │             42 │                6 │                  60 │
│ ObjectDetection_18 │             42 │                6 │                  60 │
│ ObjectDetection_19 │             38 │                7 │                  60 │
│ ObjectDetection_20 │             42 │                6 │                  60 │
│ ObjectDetection_21 │             42 │                6 │                  60 │
│ ObjectDetection_22 │             38 │                7 │                  60 │
│ ObjectDetection_23 │             38 │                7 │                  60 │
│ ObjectDetection_24 │             42 │                6 │                  60 │
│ ObjectDetection_25 │             42 │                6 │                  60 │
│ ObjectDetection_26 │             38 │                7 │                  60 │
│ ObjectDetection_27 │             42 │                6 │                  60 │
│ ObjectDetection_28 │             46 │                8 │                  60 │
│ ObjectDetection_29 │             46 │                7 │                  60 │
│ ObjectDetection_30 │             42 │                8 │                  60 │
│ ObjectDetection_31 │             44 │                7 │                  60 │
│ ObjectDetection_32 │             47 │                6 │                  60 │
│ ObjectDetection_33 │             47 │                6 │                  60 │
│ ObjectDetection_34 │             44 │                7 │                  60 │
│ ObjectDetection_35 │             42 │                8 │                  60 │
│ ObjectDetection_36 │             44 │                7 │                  60 │
│ ObjectDetection_37 │             47 │                6 │                  60 │
│ ObjectDetection_38 │             42 │                8 │                  60 │
│ ObjectDetection_39 │             44 │                7 │                  60 │
│ ObjectDetection_40 │             47 │                6 │                  60 │
│ ObjectDetection_41 │             47 │                6 │                  60 │
│ ObjectDetection_42 │             42 │                8 │                  60 │
│ ObjectDetection_43 │             44 │                7 │                  60 │
│ ObjectDetection_44 │             47 │                6 │                  60 │
│ ObjectDetection_45 │             47 │                6 │                  60 │
│ ObjectDetection_46 │             44 │                7 │                  60 │
│ ObjectDetection_47 │             42 │                8 │                  60 │
│ ObjectDetection_48 │             44 │                7 │                  60 │
│ ObjectDetection_49 │             47 │                6 │                  60 │
│ ObjectDetection_50 │             47 │                6 │                  60 │
│ ObjectDetection_51 │             44 │                7 │                  60 │
│ ObjectDetection_52 │             42 │                8 │                  60 │
│ ObjectDetection_53 │             44 │                7 │                  60 │
│ ObjectDetection_54 │             42 │                8 │                  60 │
│ ObjectDetection_55 │             44 │                7 │                  60 │
│ ObjectDetection_56 │             47 │                6 │                  60 │
│ ObjectDetection_57 │             47 │                6 │                  60 │
│ ObjectDetection_58 │             42 │                8 │                  60 │
│ ObjectDetection_59 │             44 │                7 │                  60 │
│ ObjectDetection_60 │             47 │                6 │                  60 │
│ ObjectDetection_61 │             44 │                7 │                  60 │
│ ObjectDetection_62 │             42 │                8 │                  60 │
│ ObjectDetection_63 │             44 │                7 │                  60 │
│ ObjectDetection_64 │             47 │                6 │                  60 │
│ ObjectDetection_65 │             47 │                6 │                  60 │
│ ObjectDetection_66 │             42 │                8 │                  60 │
│ ObjectDetection_67 │             44 │                7 │                  60 │
│ ObjectDetection_68 │             47 │                6 │                  60 │
│ ObjectDetection_69 │             47 │                6 │                  60 │
│ ObjectDetection_70 │             44 │                7 │                  60 │
│ ObjectDetection_71 │             42 │                8 │                  60 │
│ ObjectDetection_72 │             44 │                7 │                  60 │
│ ObjectDetection_73 │             47 │                6 │                  60 │
│ ObjectDetection_74 │             44 │                7 │                  60 │
│ ObjectDetection_75 │             42 │                8 │                  60 │
│        Pedigree_11 │            302 │               17 │                 385 │
│        Pedigree_12 │            303 │               17 │                 385 │
│        Pedigree_13 │            300 │               17 │                 385 │
│        Promedus_11 │            398 │               14 │                 461 │
│        Promedus_12 │            437 │               19 │                 534 │
│        Promedus_13 │            810 │               10 │                 894 │
│        Promedus_14 │            347 │               22 │                 414 │
│        Promedus_15 │            319 │               11 │                 385 │
│        Promedus_16 │            673 │               12 │                 715 │
│        Promedus_17 │            804 │               20 │                 916 │
│        Promedus_18 │            318 │               22 │                 374 │
│        Promedus_19 │            535 │               21 │                 624 │
│        Promedus_20 │            466 │               18 │                 546 │
│        Promedus_21 │            396 │               10 │                 473 │
│        Promedus_22 │            344 │               10 │                 400 │
│        Promedus_23 │            584 │               19 │                 674 │
│        Promedus_24 │            148 │                5 │                 200 │
│        Promedus_25 │            896 │               21 │                1005 │
│        Promedus_26 │            313 │                4 │                 614 │
│        Promedus_27 │            349 │               16 │                 410 │
│        Promedus_28 │            398 │               15 │                 463 │
│        Promedus_29 │            368 │                5 │                 434 │
│        Promedus_30 │            238 │                7 │                 306 │
│        Promedus_31 │            390 │               10 │                 466 │
│        Promedus_32 │            457 │                9 │                 511 │
│        Promedus_33 │            322 │                6 │                 378 │
│        Promedus_34 │            353 │               15 │                 415 │
│        Promedus_35 │            390 │               10 │                 467 │
│        Promedus_36 │            390 │               10 │                 467 │
│        Promedus_37 │            948 │               20 │                1039 │
│        Promedus_38 │            554 │               19 │                 668 │
│    Segmentation_11 │            190 │               14 │                 228 │
│    Segmentation_12 │            180 │               14 │                 229 │
│    Segmentation_13 │            192 │               15 │                 235 │
│    Segmentation_14 │            187 │               14 │                 226 │
│    Segmentation_15 │            205 │               14 │                 232 │
│    Segmentation_16 │            185 │               14 │                 231 │
│         linkage_11 │            815 │               25 │                1077 │
│         linkage_12 │            901 │               23 │                1183 │
│         linkage_13 │            808 │               23 │                1062 │
│         linkage_14 │            337 │               20 │                 448 │
│         linkage_15 │            932 │               30 │                1152 │
│         linkage_16 │            317 │               19 │                 402 │
│         linkage_17 │            863 │               21 │                1068 │
│         linkage_18 │            885 │               19 │                1118 │
│         linkage_19 │            614 │               17 │                 793 │
│         linkage_20 │            860 │               22 │                1160 │
│         linkage_21 │            347 │               18 │                 437 │
│         linkage_22 │            704 │               19 │                 798 │
│         linkage_23 │            881 │               21 │                1032 │
│         linkage_24 │           1017 │               19 │                1289 │
│         linkage_25 │            810 │               20 │                1030 │
│         linkage_26 │           1049 │               23 │                1289 │
│         linkage_27 │            657 │               23 │                 811 │
│       relational_1 │            251 │              251 │                 500 │
│       relational_2 │          12500 │                3 │               13000 │
│       relational_3 │            300 │                8 │                1000 │
│       relational_4 │          10101 │              101 │               10200 │
│       relational_5 │           5500 │               11 │               10000 │
└────────────────────┴────────────────┴──────────────────┴─────────────────────┘

Artifacts.toml:

[uai2014]
git-tree-sha1 = "b542e3def878031ac244a97a539b502dd755b4b1"

    [[uai2014.download]]
    sha256 = "0bdc2cfd72395d21f80914eb9b061e5494635950316a031179b8e8afcc613668"
    url = "https://gist.github.com/mroavi/50993627d6bf699e02f595a646e96675/raw/b542e3def878031ac244a97a539b502dd755b4b1.tar.gz"

Next week I will try to extract the width of the tree generated by the OMEinsum contraction order optimizer for each of the problems.

GiggleLiu commented 1 year ago

Thanks for looking into it. I have fixed most of the instances in PR #16 . Some notes:

[ ] We need an automatic way to rescale tensors, as you suggested.
[ ] We need to fix the two failed instances.
[ ] We do not have much to do with the relational dataset, the problem scale is too large.

GiggleLiu commented 1 year ago

Tests not including relational dataset.

Test Summary:                         | Pass  Fail  Total      Time
gradient-based tensor network solvers |  135     2    137  66m48.1s
  UAI 2014 problem set                |  135     2    137  66m48.1s
    Alchemy benchmark                 |    1            1      5.0s
    CSP benchmark                     |    3            3     21.3s
    DBN benchmark                     |    6            6     14.1s
    Grids benchmark                   |    8            8     31.0s
    linkage benchmark                 |   16     1     17  61m18.0s
      linkage_11                      |    1            1   1m10.0s
      linkage_12                      |    1            1     57.3s
      linkage_13                      |    1            1     54.9s
      linkage_14                      |    1            1     18.8s
      linkage_15                      |          1      1  13m28.9s
      linkage_16                      |    1            1     16.5s
      linkage_17                      |    1            1   1m12.5s
      linkage_18                      |    1            1     53.6s
      linkage_19                      |    1            1     34.7s
      linkage_20                      |    1            1     54.3s
      linkage_21                      |    1            1     16.5s
      linkage_22                      |    1            1     41.8s
      linkage_23                      |    1            1  36m02.2s
      linkage_24                      |    1            1   1m04.5s
      linkage_25                      |    1            1     51.6s
      linkage_26                      |    1            1   1m02.1s
      linkage_27                      |    1            1     37.8s
    ObjectDetection benchmark         |   64     1     65   2m00.3s
      ObjectDetection_11              |    1            1      1.4s
      ObjectDetection_12              |    1            1      1.4s
      ObjectDetection_13              |    1            1      4.3s
      ObjectDetection_14              |    1            1      1.5s
      ObjectDetection_15              |    1            1      0.8s
      ObjectDetection_16              |    1            1      4.2s
      ObjectDetection_17              |    1            1      1.4s
      ObjectDetection_18              |    1            1      0.9s
      ObjectDetection_19              |    1            1      4.5s
      ObjectDetection_20              |    1            1      2.0s
      ObjectDetection_21              |    1            1      0.8s
      ObjectDetection_22              |    1            1      1.1s
      ObjectDetection_23              |    1            1      4.3s
      ObjectDetection_24              |    1            1      1.4s
      ObjectDetection_25              |    1            1      0.8s
      ObjectDetection_26              |    1            1      4.2s
      ObjectDetection_27              |    1            1      1.4s
      ObjectDetection_28              |    1            1      2.0s
      ObjectDetection_29              |    1            1      3.5s
      ObjectDetection_30              |    1            1      1.2s
      ObjectDetection_31              |    1            1      3.8s
      ObjectDetection_32              |    1            1      0.9s
      ObjectDetection_33              |    1            1      1.7s
      ObjectDetection_34              |    1            1      1.0s
      ObjectDetection_35              |          1      1      1.3s
      ObjectDetection_36              |    1            1      2.7s
      ObjectDetection_37              |    1            1      1.5s
      ObjectDetection_38              |    1            1      1.1s
      ObjectDetection_39              |    1            1      3.5s
      ObjectDetection_40              |    1            1      1.7s
      ObjectDetection_41              |    1            1      1.7s
      ObjectDetection_42              |    1            1      1.4s
      ObjectDetection_43              |    1            1      3.1s
      ObjectDetection_44              |    1            1      0.9s
      ObjectDetection_45              |    1            1      1.2s
      ObjectDetection_46              |    1            1      0.9s
      ObjectDetection_47              |    1            1      1.2s
      ObjectDetection_48              |    1            1      3.4s
      ObjectDetection_49              |    1            1      0.9s
      ObjectDetection_50              |    1            1      1.3s
      ObjectDetection_51              |    1            1      0.9s
      ObjectDetection_52              |    1            1      1.2s
      ObjectDetection_53              |    1            1      3.4s
      ObjectDetection_54              |    1            1      1.1s
      ObjectDetection_55              |    1            1      3.4s
      ObjectDetection_56              |    1            1      0.9s
      ObjectDetection_57              |    1            1      1.3s
      ObjectDetection_58              |    1            1      1.5s
      ObjectDetection_59              |    1            1      3.1s
      ObjectDetection_60              |    1            1      1.5s
      ObjectDetection_61              |    1            1      0.9s
      ObjectDetection_62              |    1            1      1.8s
      ObjectDetection_63              |    1            1      2.6s
      ObjectDetection_64              |    1            1      0.9s
      ObjectDetection_65              |    1            1      1.5s
      ObjectDetection_66              |    1            1      1.9s
      ObjectDetection_67              |    1            1      3.2s
      ObjectDetection_68              |    1            1      0.9s
      ObjectDetection_69              |    1            1      1.2s
      ObjectDetection_70              |    1            1      0.9s
      ObjectDetection_71              |    1            1      1.2s
      ObjectDetection_72              |    1            1      3.5s
      ObjectDetection_73              |    1            1      1.2s
      ObjectDetection_74              |    1            1      0.9s
      ObjectDetection_75              |    1            1      1.2s
    Pedigree benchmark                |    3            3      8.7s
    Promedus benchmark                |   28           28   1m45.8s
    Segmentation benchmark            |    6            6     23.9s
ERROR: Some tests did not pass: 135 passed, 2 failed, 0 errored, 0 broken.

mroavi commented 1 year ago

With respect to 2:

The double precision is underflowing since we are multiplying many small numbers together.

I realized that this is not entirely true. It holds for Bayesian Networks (where the factors represent conditional probability distributions and hence all the elements lie between 0 and 1 inclusive). However, for Markov Random Fields, the factor elements can take any real number. Therefore, the precision might overflow or underflow. One approach that comes to mind is to add a preprocessing step that normalizes all factor elements to the range between 0 and 1.

mroavi commented 1 year ago

Fixed in https://github.com/mroavi/TensorInference.jl/pull/16

TensorBFS / TensorInference.jl

Inference errors or fails for 105 of the 142 UAI 2014 Inference Competition problems #15