dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.56k stars 473 forks source link

Evaluation Metric Modification #441

Closed shenoynikhil closed 1 year ago

shenoynikhil commented 1 year ago

I would like to use an evaluation metric as follows, where I would like to compute the pearson correlation coefficient (pcc) between y_true and y_score (for a multi output regression task). Now, since I perform the fitting on a low dimensional output, I would like to use the PCA components to get back the original dimensions and then compute the coefficient.

In the code snippet below, I use a variable passed through the function, but I wanted to know the way to pass additional arguments (like the pca) to the metric class

# pca_y is a variable passed and this class here is defined locally which kind of isn't the right way to do it

class PCC(Metric):
    def __init__(self):
        self._name = "pcc"
        self._maximize = True

    def __call__(self, y_true, y_score):
        y_true, y_score = y_true @ pca_y.components_, y_score @ pca_y.components_
        corrsum = 0
        for i in range(len(y_true)):
            corrsum += np.corrcoef(y_true[i], y_score[i])[1, 0]        
        return corrsum / y_true.shape[0]

I will be passing this to the model.fit(..., eval_metric=[PCC]). Since it takes classes as inputs, I am finding it difficult to understand how to give this as an input to the metric evaluation class. It could be a naive doubt, but would like your inputs on it.

Optimox commented 1 year ago

Is pca_y a fixed input? It's just a mapping from a n dimension vector to a m dimension vector ?

What is @ doing ?

Why can't you do something like this :

class PCC(Metric):
    def __init__(self):
        self._name = "pcc"
        self._maximize = True
        self.pca_y = define_your_mapping_here()

    def __call__(self, y_true, y_score):
        y_true, y_score = y_true @ self.pca_y.components_, y_score @ self.pca_y.components_
        corrsum = 0
        for i in range(len(y_true)):
            corrsum += np.corrcoef(y_true[i], y_score[i])[1, 0]        
        return corrsum / y_true.shape[0]
shenoynikhil commented 1 year ago

So,

Optimox commented 1 year ago

You won't be able to easily add more inputs to the MetricContainer https://github.com/dreamquark-ai/tabnet/blob/4fa545da50796f0d16f49d0cb476d5a30c2a27c1/pytorch_tabnet/metrics.py#L122

But since the PCA transform is fixed I think it would do the job to define it during initialization of the class.

I'm sorry but I don't think there is an easier way to do what you want.

shenoynikhil commented 1 year ago

Thanks a lot!

shenoynikhil commented 1 year ago

@Optimox Let me know if this could be implemented (I can create a PR for this).

Optimox commented 1 year ago

I think this is beyond the scope of the library and would probably create a lot of complexity for a very rare case scenario.