Evaluation Metric Modification

shenoynikhil commented 1 year ago

I would like to use an evaluation metric as follows, where I would like to compute the pearson correlation coefficient (pcc) between y_true and y_score (for a multi output regression task). Now, since I perform the fitting on a low dimensional output, I would like to use the PCA components to get back the original dimensions and then compute the coefficient.

In the code snippet below, I use a variable passed through the function, but I wanted to know the way to pass additional arguments (like the pca) to the metric class

# pca_y is a variable passed and this class here is defined locally which kind of isn't the right way to do it

class PCC(Metric):
    def __init__(self):
        self._name = "pcc"
        self._maximize = True

    def __call__(self, y_true, y_score):
        y_true, y_score = y_true @ pca_y.components_, y_score @ pca_y.components_
        corrsum = 0
        for i in range(len(y_true)):
            corrsum += np.corrcoef(y_true[i], y_score[i])[1, 0]        
        return corrsum / y_true.shape[0]

I will be passing this to the model.fit(..., eval_metric=[PCC]). Since it takes classes as inputs, I am finding it difficult to understand how to give this as an input to the metric evaluation class. It could be a naive doubt, but would like your inputs on it.

Optimox commented 1 year ago

Is pca_y a fixed input? It's just a mapping from a n dimension vector to a m dimension vector ?

What is @ doing ?

Why can't you do something like this :

class PCC(Metric):
    def __init__(self):
        self._name = "pcc"
        self._maximize = True
        self.pca_y = define_your_mapping_here()

    def __call__(self, y_true, y_score):
        y_true, y_score = y_true @ self.pca_y.components_, y_score @ self.pca_y.components_
        corrsum = 0
        for i in range(len(y_true)):
            corrsum += np.corrcoef(y_true[i], y_score[i])[1, 0]        
        return corrsum / y_true.shape[0]

shenoynikhil commented 1 year ago

So,

pca is based on the matrix transformation (learned by fitting on the train labels) applied to all the labels y initially.
@ is just matrix multiplication.
So, this pca has it's use outside the metric as well, in converting test predictions to their real dimension. So, initiallizing it again inside does not seem ideal.
Is there anyway to pass it to the metric?

Optimox commented 1 year ago

You won't be able to easily add more inputs to the MetricContainer https://github.com/dreamquark-ai/tabnet/blob/4fa545da50796f0d16f49d0cb476d5a30c2a27c1/pytorch_tabnet/metrics.py#L122

But since the PCA transform is fixed I think it would do the job to define it during initialization of the class.

I'm sorry but I don't think there is an easier way to do what you want.

shenoynikhil commented 1 year ago

Thanks a lot!

shenoynikhil commented 1 year ago

@Optimox Let me know if this could be implemented (I can create a PR for this).

Optimox commented 1 year ago

I think this is beyond the scope of the library and would probably create a lot of complexity for a very rare case scenario.

dreamquark-ai / tabnet

Evaluation Metric Modification #441