ahmedmalaa / Symbolic-Metamodeling

Codebase for "Demystifying Black-box Models with Symbolic Metamodels", NeurIPS 2019.
48 stars 23 forks source link

ValueError not explaining what's happening in metamodel.fit() #4

Closed davideferrari92 closed 3 years ago

davideferrari92 commented 3 years ago

Hi! I've got this problem trying to build the metamodel on my dataset. This is a complete dataset (no NaN) and they are all float64 values, as shown here.

>>> X_train[features].dtypes.unique()
array([dtype('float64')], dtype=object)

I leave here the trace of the error in which is shown that the ValueError is not explanatory on what is going on.

What do you think is happening?

>>> metamodel = symbolic_metamodel(model, X_train[features])
>>> metamodel.fit(num_iter=10, batch_size=X_train[features].shape[0], learning_rate=.01)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-5a4271687174> in <module>
      1 metamodel = symbolic_metamodel(model, X_train)
      2 
----> 3 metamodel.fit(num_iter=10, batch_size=X_train.shape[0], learning_rate=.01)

/mnt/data/workspace_davide/Symbolic-Metamodeling/pysymbolic/algorithms/symbolic_metamodeling.py in fit(self, num_iter, batch_size, learning_rate)
    239         for u in self.tqdm_mode(range(self.X.shape[1])):
    240 
--> 241             self.params[u, :] = tune_single_dim(lr=0.1, n_iter=500, x=self.X_new[:, u], y=self.Y_r)
    242 
    243         self.set_equation(reset_init_model=True)

/mnt/data/workspace_davide/Symbolic-Metamodeling/pysymbolic/algorithms/symbolic_metamodeling.py in tune_single_dim(lr, n_iter, x, y, verbosity)
    118 
    119         new_grads   = basis_grad(a, b, c, x[batch_index])
--> 120         func_true   = basis(a, b, c, x[batch_index])
    121 
    122         loss        =  np.mean((func_true - y[batch_index])**2)

/mnt/data/workspace_davide/Symbolic-Metamodeling/pysymbolic/algorithms/symbolic_metamodeling.py in basis(a, b, c, x, hyper_order)
     58     func_   = MeijerG(theta=[a, a, a, b, c], order=hyper_order, approximation_order=3)
     59 
---> 60     return func_.evaluate(x + epsilon)
     61 
     62 def basis_expression(a, b, c, hyper_order=[1, 2, 2, 2]):

/mnt/data/workspace_davide/Symbolic-Metamodeling/pysymbolic/models/special_functions.py in evaluate(self, X)
    139         elif self.evaluation_mode in ['numpy','cython','theano']:
    140 
--> 141             evaluators_ = {'numpy': lambdify([x], self.approx_expression(), modules=['math']),
    142                            'cython': lambdify([x], self.approx_expression(), modules=['math']), #ufuncify([x], self.approx_expression()),
    143                            'theano': lambdify([x], self.approx_expression(), modules=['math'])} #theano_function([x], [self.approx_expression()])}

/mnt/data/workspace_davide/Symbolic-Metamodeling/pysymbolic/models/special_functions.py in approx_expression(self, midpoint)
    112         x                 = Symbol('x', real=True)
    113 
--> 114         self.Taylor_poly_ = taylor(self.math_expr, midpoint, self.approximation_order)
    115         self.coeffp       = self.Taylor_poly_[::-1]
    116 

/opt/anaconda3/lib/python3.7/site-packages/mpmath/calculus/differentiation.py in taylor(ctx, f, x, n, **options)
    574     gen = enumerate(ctx.diffs(f, x, n, **options))
    575     if options.get("chop", True):
--> 576         return [ctx.chop(d)/ctx.factorial(i) for i, d in gen]
    577     else:
    578         return [d/ctx.factorial(i) for i, d in gen]

/opt/anaconda3/lib/python3.7/site-packages/mpmath/calculus/differentiation.py in <listcomp>(.0)
    574     gen = enumerate(ctx.diffs(f, x, n, **options))
    575     if options.get("chop", True):
--> 576         return [ctx.chop(d)/ctx.factorial(i) for i, d in gen]
    577     else:
    578         return [d/ctx.factorial(i) for i, d in gen]

/opt/anaconda3/lib/python3.7/site-packages/mpmath/calculus/differentiation.py in diffs(ctx, f, x, n, **options)
    273         yield ctx.diff(f, x, 0, singular=True)
    274     else:
--> 275         yield f(ctx.convert(x))
    276     if n < 1:
    277         return

/mnt/data/workspace_davide/Symbolic-Metamodeling/pysymbolic/models/special_functions.py in math_expr(self, x)
    103         b_q_ = [list(self.b_q[k]) for k in range(len(self.b_q))]
    104 
--> 105         return mp.meijerg(a_p_, b_q_, self._const * x)
    106 
    107     def approx_expression(self, midpoint=0.5):

/opt/anaconda3/lib/python3.7/site-packages/mpmath/functions/hypergeometric.py in meijerg(ctx, a_s, b_s, z, r, series, **kwargs)
   1056                 terms.append((bases, expts, gn, gd, hn, hd, hz))
   1057             return terms
-> 1058     return ctx.hypercomb(h, a+b, **kwargs)
   1059 
   1060 @defun_wrapped

/opt/anaconda3/lib/python3.7/site-packages/mpmath/functions/hypergeometric.py in hypercomb(ctx, function, params, discard_known_zeros, **kwargs)
    125                 v = ctx.fprod([ctx.hyper(a_s, b_s, z, **kwargs)] + \
    126                     [ctx.gamma(a) for a in alpha_s] + \
--> 127                     [ctx.rgamma(b) for b in beta_s] + \
    128                     [ctx.power(w,c) for (w,c) in zip(w_s,c_s)])
    129                 if verbose:

/opt/anaconda3/lib/python3.7/site-packages/mpmath/functions/hypergeometric.py in hyper(ctx, a_s, b_s, z, **kwargs)
    224         elif q == 0: return ctx._hyp1f0(a_s[0][0], z)
    225     elif p == 2:
--> 226         if   q == 1: return ctx._hyp2f1(a_s, b_s, z, **kwargs)
    227         elif q == 2: return ctx._hyp2f2(a_s, b_s, z, **kwargs)
    228         elif q == 3: return ctx._hyp2f3(a_s, b_s, z, **kwargs)

/opt/anaconda3/lib/python3.7/site-packages/mpmath/functions/hypergeometric.py in _hyp2f1(ctx, a_s, b_s, z, **kwargs)
    454                 T2 = ([-z],[-b], [c,ab],[a,c-b], [b,t+b],[ctx.mpq_1-ab],  rz)
    455                 return T1, T2
--> 456             v = ctx.hypercomb(h, [a,b], **kwargs)
    457 
    458         # Use 1-z transformation

/opt/anaconda3/lib/python3.7/site-packages/mpmath/functions/hypergeometric.py in hypercomb(ctx, function, params, discard_known_zeros, **kwargs)
    125                 v = ctx.fprod([ctx.hyper(a_s, b_s, z, **kwargs)] + \
    126                     [ctx.gamma(a) for a in alpha_s] + \
--> 127                     [ctx.rgamma(b) for b in beta_s] + \
    128                     [ctx.power(w,c) for (w,c) in zip(w_s,c_s)])
    129                 if verbose:

/opt/anaconda3/lib/python3.7/site-packages/mpmath/functions/hypergeometric.py in hyper(ctx, a_s, b_s, z, **kwargs)
    224         elif q == 0: return ctx._hyp1f0(a_s[0][0], z)
    225     elif p == 2:
--> 226         if   q == 1: return ctx._hyp2f1(a_s, b_s, z, **kwargs)
    227         elif q == 2: return ctx._hyp2f2(a_s, b_s, z, **kwargs)
    228         elif q == 3: return ctx._hyp2f3(a_s, b_s, z, **kwargs)

/opt/anaconda3/lib/python3.7/site-packages/mpmath/functions/hypergeometric.py in _hyp2f1(ctx, a_s, b_s, z, **kwargs)
    441     if absz <= 0.8 or (ctx.isint(a) and a <= 0 and a >= -1000) or \
    442                       (ctx.isint(b) and b <= 0 and b >= -1000):
--> 443         return ctx.hypsum(2, 1, (atype, btype, ctype), [a, b, c], z, **kwargs)
    444 
    445     orig = ctx.prec

/opt/anaconda3/lib/python3.7/site-packages/mpmath/ctx_mp.py in hypsum(ctx, p, q, flags, coeffs, z, accurate_small, **kwargs)
    672             v = z._mpc_
    673         if key not in ctx.hyp_summators:
--> 674             ctx.hyp_summators[key] = libmp.make_hyp_summator(key)[1]
    675         summator = ctx.hyp_summators[key]
    676         prec = ctx.prec

/opt/anaconda3/lib/python3.7/site-packages/mpmath/libmp/libhyper.py in make_hyp_summator(key)
    157             add("    %sCIM_%i = ym >> (-offset)" % (W, i))
    158         else:
--> 159             raise ValueError
    160 
    161     l_areal = len(areal)

ValueError: 

Thank you very much!

Davide

ahmedmalaa commented 3 years ago

Hi Davide - never seen this error before. Can you please send the dimension of the dataset and the range of values for X? Thanks.

davideferrari92 commented 3 years ago

Hi! Thank you for the very fast answer.

The dimension of my dataset is 963 rows x 121 columns. Overall in the dataset the minimum and maximum values that I can have are -1 and 726964.

They are almost all integer from 1 to 10 and few columns with larger number. All of them are as float64.

Davide


Davide Ferrari Graduate Research Fellow @ UNIMORE

Dipartimento Chirurgico, Medico, Odontoiatrico e di Scienze Morfologiche con interesse Trapiantologico, Oncologico e di Medicina Dipartimento di Scienze Matematiche, Fisiche e Informatiche

Contacts email: davideferrari@unimore.it phone: +39 333 99 61 873 https://www.linkedin.com/in/davide-ferrari-195655a0/

On 29 Oct 2020, at 21:11, Ahmed M. Alaa notifications@github.com wrote:

Hi Davide - never seen this error before. Can you please send the dimension of the dataset and the range of values for X? Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ahmedmalaa/Symbolic-Metamodeling/issues/4#issuecomment-718993618, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQIFDNLBPIGOPPGA2CXTN3SNHD5NANCNFSM4TDNIGHQ.

ahmedmalaa commented 3 years ago

I am not sure if that's the source of error but 726964 is pretty big and maybe causing some numerical problems. I recommend you normalize all columns to [0, 1] and if that still giving an error you may try fitting a subset of the columns first e.g. 10 features only. Also note that this method does not scale well for very high dimensions, with 121 feature you will get super long equations that are not really interpretable anymore.

davideferrari92 commented 3 years ago

That’s sounds good. I’ll try!

Thank you!

I’ll get to you if things still does not go well.

Have a nice day!!

Davide

On 30 Oct 2020, at 11:00, Ahmed M. Alaa notifications@github.com wrote:

I am not sure if that's the source of error but 726964 is pretty big and maybe causing some numerical problems. I recommend you normalize all columns to [0, 1] and if that still giving an error you may try fitting a subset of the columns first e.g. 10 features only. Also note that this method does not scale well for very high dimensions, with 121 feature you will get super long equations that are not really interpretable anymore.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ahmedmalaa/Symbolic-Metamodeling/issues/4#issuecomment-719458582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQIFDIAVTYLNF3NBIPHG4DSNKFCVANCNFSM4TDNIGHQ.