NeuralNetworkVerification / Marabou

Other
253 stars 88 forks source link

Dozens of similar queries result in Marabou getting stuck #783

Closed SuryaB1 closed 5 months ago

SuryaB1 commented 6 months ago

Hi,

I make the query (given in the test script below) to Marabou along with thousands of other queries in a separate, larger program that I have. Whenever I run the script below, which runs the enclosed query individually, Marabou returns an appropriate result. However, when this query is run as part of my separate, larger program that runs thousands of queries, Marabou gets stuck on this query and does not terminate (can only stop the process by closing the terminal, rather than terminating via ctrl+C).

This problem only occurs after the first run of my program; in other words, Marabou does not get stuck on this query during the first run of my program.

As noted in the Issue title, I've noticed that my program passes highly similar queries to Marabou multiple times before the program gets to this query that Marabou gets stuck on; by "similar", I mean that the coefficients and constants of the constraint equations are off on orders of magnitude of ±1e-12. Could these repeated similar queries be causing the issue? For context, I have attached a text file containing the runtime output logging the similar queries I am referring to in the order they were queried, excluding the thousands of other irrelevant queries (which I've replaced with ...), ending with the query that got Marabou stuck. The filename is issue.txt. issue.txt

Another suspicion is caching-related issues since Marabou only started getting stuck on the below query every time my separate program was run after the first run.

I tried creating a separate test script similar to the one below, where I added a for-loop around the below query (so a query per for-loop). I also added ±1e-12, 2e-12, 3e-12 to the coefficients and constants of the equation constraints for some of the for-loop iterations/queries. Still, all of the queries succeeded throughout multiple runs, so I was, unfortunately, not able to replicate the issue that way.

Any insights would be greatly appreciated. Please let me know if any further information is required.

#############################################################################
### To summarize, the query constraints are as follows (where x0 and x1 refer to the network's
#   two input variables and output_var refers to the network's output variable):
#
# Input space constraints:
# x0 <= 100
# x0 >= -100
# x1 <= 100
# x1 >= -100
#
# Disjunctions:
# x0 <= 9.99
# x0 >= 10.01
# x1 <= 9.99
# x1 >= 10.01
#
# Inequalities:
# 1.0*x0 + 1.0*x1 <= 20.68676191034973
# 0.13542994985903067*x0 + 1.0*x1 <= 11.943228800010417
# 137.12153670362702*x0 + 1.0*x1 <= 1401.0531025403711
# 1.0*x0 + 1.0*x1 >= 19.986117936461756
# -2.1425049404379113*x0 + 1.0*x1 <= -10.535579455125532
# -1.2074084055251617*x0 + 1.0*x1 >= -2.222228741939958
#
# output_var == 0
############

from maraboupy import Marabou
from maraboupy import MarabouCore
from maraboupy import MarabouUtils

TF_NN_FILENAME = "saved_models/diagonal-split_classif_nnet"

### Read network
network = Marabou.read_tf(filename = TF_NN_FILENAME, modelType="savedModel_v2")
inputVars = network.inputVars[0][0]
outputVars = network.outputVars[0].flatten()

### Set bounds for both input variables
network.setLowerBound(inputVars[0], -100)
network.setUpperBound(inputVars[0], 100)
network.setLowerBound(inputVars[1], -100)
network.setUpperBound(inputVars[1], 100)

### Set bounds for the output variable
eq = MarabouUtils.Equation()
eq.addAddend(1, outputVars[0])
eq.setScalar(0.0)
network.addEquation(eq, isProperty=True)

### Add disjunction constraint
eq1 = MarabouUtils.Equation(MarabouCore.Equation.LE)
eq1.addAddend(1.0, inputVars[0])
eq1.setScalar(9.99)

eq2 = MarabouUtils.Equation(MarabouCore.Equation.GE)
eq2.addAddend(1.0, inputVars[0])
eq2.setScalar(10.01)

eq3 = MarabouUtils.Equation(MarabouCore.Equation.LE)
eq3.addAddend(1.0, inputVars[1])
eq3.setScalar(9.99)

eq4 = MarabouUtils.Equation(MarabouCore.Equation.GE)
eq4.addAddend(1.0, inputVars[1])
eq4.setScalar(10.01)

network.addDisjunctionConstraint( [[eq1], [eq2], [eq3], [eq4]] )

### Add inequality constraints
eq5 = MarabouUtils.Equation(MarabouCore.Equation.LE)
eq5.addAddend(1.0, inputVars[0])
eq5.addAddend(1.0, inputVars[1])
eq5.setScalar(20.68676191034973)
network.addEquation(eq5, isProperty=True)

eq6 = MarabouUtils.Equation(MarabouCore.Equation.LE)
eq6.addAddend(0.13542994985903067, inputVars[0])
eq6.addAddend(1.0, inputVars[1])
eq6.setScalar(11.943228800010417)
network.addEquation(eq6, isProperty=True)

eq7 = MarabouUtils.Equation(MarabouCore.Equation.LE)
eq7.addAddend(137.12153670362702, inputVars[0])
eq7.addAddend(1.0, inputVars[1])
eq7.setScalar(1401.0531025403711)
network.addEquation(eq7, isProperty=True)

eq8 = MarabouUtils.Equation(MarabouCore.Equation.GE)
eq8.addAddend(1.0, inputVars[0])
eq8.addAddend(1.0, inputVars[1])
eq8.setScalar(19.986117936461756)
network.addEquation(eq8, isProperty=True)

eq9 = MarabouUtils.Equation(MarabouCore.Equation.LE)
eq9.addAddend(-2.1425049404379113, inputVars[0])
eq9.addAddend(1.0, inputVars[1])
eq9.setScalar(-10.535579455125532)
network.addEquation(eq9, isProperty=True)

eq10 = MarabouUtils.Equation(MarabouCore.Equation.GE)
eq10.addAddend(-1.2074084055251617, inputVars[0])
eq10.addAddend(1.0, inputVars[1])
eq10.setScalar(-2.222228741939958)
network.addEquation(eq10, isProperty=True)

### Query Marabou
options = Marabou.createOptions(verbosity = 0)
exitCode, vals, stats = network.solve(verbose=False, options=options)
print("vals:", vals)
wu-haoze commented 6 months ago

@SuryaB1 , by "this problem only occurs after the first run of my program", do you mean you ran the very same program, and it got stuck in one run but did not get stuck in another run? This would be odd since there is no unseeded randomness in Marabou.

Also, are you certain that it got stuck, instead of the printing to the console is not real time? Could you try to run Python with the -u option and let me know if it's still stuck? python -u [your_script.py]

SuryaB1 commented 6 months ago

By that, I mean that the program was only ever able to be fully successfully executed the first time I ever ran it. On all subsequent runs, Marabou got stuck on a query as I discuss above. I ran python -u ..., and the issue persists. As I briefly mention above, could it be that something is cached or set by Marabou during the first run causing it to get stuck on this query in subsequent runs?

I am certain Marabou got stuck based on my program's logs, which are printed without any differences using -u and indicate that my program got stuck on MarabouNetwork.solve(...). Also, I am unable to terminate the process via ctrl+C when I should be able to, which also happened in a different Issue (#666) where Marabou got stuck.

wu-haoze commented 6 months ago

@SuryaB1 . If you are creating a fresh MarabouNetwork object through the read_tf function each time you invoke the solve method, there should not be cache-related issues, because the solving runs would be independent.

Can you send me the problematic verification query so I can look into it on my end? You could save the query by calling network.saveQuery(filename="test.ipq") right before where you called network.solve().

SuryaB1 commented 5 months ago

I see what you mean.

As I discussed above, the non-termination issue does not occur on an individual query but, rather, after a sequence of queries.

I have saved and attached the sequence of queries in .ipq files numbered based on the chronological order of the queries in a zipped folder called queries. The query with the highest number is the one that Marabou is not terminating on. I used the network.saveQuery(filename="test.ipq") as you suggested.

queries.zip

I also want to note that, after recently updating my local Marabou build, my program tends to avoid this non-terminating issue more than it did before, but still faces the non-terminating query most of the time it is run.

Please let me know if there's anything else you'll need to help with the investigation. Thank you.

wu-haoze commented 5 months ago

@SuryaB1 , thanks a lot for the feedback! The queries you sent were helpful. I was able to reproduce the issue you described. After identifying the first problematic query, I did the following test:

for i in range(100):                                                                                                                                                                                               
    ipq = MarabouCore.loadQuery("queries/query_695.ipq")                                                                                                                                                     
    options = Marabou.createOptions(verbosity=0)                                                                                                                                                                   
    result, _, _ = Marabou.solve_query(ipq, options=options)                                                                                                                                                       
    print(result)   

Marabou returns SAT on the first iteration, but hangs on the second iteration. From the log, it's not that the Marabou process froze, but that the second iteration is taking much much longer than the first iteration. The reason is that the branching order (during the branch-and-bound like search) is different. The reason of this difference is that there are some seeded randomness in Marabou (in its default version). And invoking the same query twice in a same run, somehow does not "reset" the random seed", and will result in very different solver traces. In contrast, if we invoke the following lines on the command line, Marabou's execution is the same for each iterations:

for i in {1..100}; do ./build/Marabou --input-query queries/query_695.ipq ; done

Apology that I was not accurate when saying previous runs won't affect later runs..

The reason that pulling from the latest Marabou reduces the chances of getting stuck, has to do with the fact that the latest Marabou version is on average faster.

There are two solutions.

Solution 1: set a timeout for the solver run, and expect the result to be "unsat", "sat", and "TIMEOUT":

options = Marabou.createOptions(verbosity=0, timeoutInSeconds=2)
result, _, _ = network.solve(options=options) 

And if Marabou returns "TIMEOUT", call solve again (perhaps with a longer (like 2x) timeout). Basically, a sequential portfolio.

Solution 2: build Marabou with Gurobi (if you are able to obtain a free academic license), which can be faster on average. Details can be found here: https://github.com/NeuralNetworkVerification/Marabou#compile-marabou-with-the-gurobi-optimizer-optional

SuryaB1 commented 5 months ago

Thank you for finding solutions for this, @wu-haoze!

So, it seems that Marabou is expected to take a very long time to execute certain queries due to the randomness. Thus, would having some way for Marabou to reset the seed not be that useful? You mentioned that the seed somehow does not reset when the same query is run twice, which sounds like it's supposed to be reset. Please correct me if my interpretation is incorrect.

Just wanted to understand if it's possible for there to be a more certain fix. Thanks for clarifying.

wu-haoze commented 5 months ago

would having some way for Marabou to reset the seed not be that useful?

That’s right. I don’t think resetting the seed will improve performance on average. For any random seed, there will be some problem that becomes easier and some problem that becomes harder. As you correctly interpreted, the current behavior is undesirable only from a user experience perspective.

在 2024年4月9日,19:32,Surya Bala @.***> 写道:



Thank you for finding solutions for this, @wu-haozehttps://github.com/wu-haoze!

So, it seems that Marabou is expected to take a very long time to execute certain queries due to the randomness. Thus, would having some way for Marabou to reset the seed not be that useful? You mentioned that the seed somehow does not reset when the same query is run twice, which sounds like it's supposed to be reset. Please correct me if my interpretation is incorrect.

Just wanted to understand if it's possible for there to be a more certain fix. Thanks for clarifying.

— Reply to this email directly, view it on GitHubhttps://github.com/NeuralNetworkVerification/Marabou/issues/783#issuecomment-2046369826, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRZF2ZJEP3RSX3QSOOO3NTY4SQFPAVCNFSM6AAAAABFFKOJOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWGM3DSOBSGY. You are receiving this because you were mentioned.Message ID: @.***>