Verified-Intelligence / alpha-beta-CROWN

alpha-beta-CROWN: An Efficient, Scalable and GPU Accelerated Neural Network Verifier (winner of VNN-COMP 2021, 2022, 2023, and 2024)
Other
243 stars 60 forks source link

RuntimeError: CUDA error: device-side assert triggered #38

Closed shuyilinn closed 5 months ago

shuyilinn commented 1 year ago

Describe the bug When I use alpha-beta-crown for evaluation, an cuda error shows, here is the log " Traceback (most recent call last): File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/abcrown.py", line 612, in abcrown.main() File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/abcrown.py", line 591, in main verified_status = self.complete_verifier( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/abcrown.py", line 416, in complete_verifier l, nodes, ret = self.bab( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/abcrown.py", line 235, in bab result = input_bab_parallel( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/input_split/batch_branch_and_bound.py", line 182, in input_bab_parallel global_lb, ret = net.build( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/beta_CROWN_solver.py", line 455, in build lb, ub, aux_reference_bounds = self.net.init_alpha( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/optimized_bounds.py", line 766, in init_alpha l, u = self.compute_bounds( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/bound_general.py", line 1206, in compute_bounds return self._compute_bounds_main(C=C, File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/bound_general.py", line 1303, in _compute_bounds_main self.check_prior_bounds(final) File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/bound_general.py", line 800, in check_prior_bounds self.check_prior_bounds(n) File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/bound_general.py", line 800, in check_prior_bounds self.check_prior_bounds(n) File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/bound_general.py", line 800, in check_prior_bounds self.check_prior_bounds(n) [Previous line repeated 1 more time] File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/bound_general.py", line 804, in check_prior_bounds self.compute_intermediate_bounds( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/bound_general.py", line 910, in compute_intermediate_bounds node.lower, node.upper = self.backward_general( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/backward_bound.py", line 256, in backward_general A, lower_b, upper_b = l.bound_backward( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/operators/nonlinear.py", line 512, in bound_backward As, lbias, ubias = super().bound_backward( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/operators/activation_base.py", line 248, in bound_backward As, lbias, ubias = super().bound_backward( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/operators/activation_base.py", line 66, in bound_backward self.bound_relax(x, init=True) File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/operators/nonlinear.py", line 231, in bound_relax self.bound_relax_impl_sigmoid(lb, ub, self.act_func, self.d_act_func) File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/operators/nonlinear.py", line 174, in bound_relax_impl_sigmoid self.add_linear_relaxation( File "/scratch/shuyilin/alpha-beta-CROWN/complete_verifier/auto_LiRPA/operators/activation_base.py", line 46, in add_linear_relaxation w_out[..., mask] = (k[..., mask].to(w_out) if isinstance(k, Tensor) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. " I have several onnx files for one model (the only difference of them is the weights and bias). But some onnx files meet this problem and some not.

To Reproduce

System configuration:

Thanks in advance for any ideas and suggestions.

reproduce.zip

huanzhang12 commented 1 year ago

Hi @lydialin1212 Thank you for reporting this to us! It looks to be an issue when handling the sigmoid activation function. @shizhouxing @C-lister Can you take a look using the examples from @lydialin1212 ?

shizhouxing commented 5 months ago

This issue has been fixed in the latest release.