decurtoydiaz / learning_with_signatures

Learning with Signatures
56 stars 10 forks source link

Solved #1

Closed Promisery closed 2 years ago

Promisery commented 2 years ago

Thank you for sharing your work! I'm still learning from your papers, and there is a question that I am not sure about:

When calculating test accuracy, you used rmse_c[c] = mean_squared_error(globals()['supermeanl_' + str(c2)] * supermeanA[c], globals()['featuresAA_' + str(c2)][z], squared=False) where c2 is the true label of test data. This brings some concern because when testing, c2 should not be available and thus should not be used anywhere. Using globals()['featuresAA_' + str(c2)][z] is fine as it only loads the test data. However, using globals()['supermeanl_' + str(c2)] may cause a leakage. When using supermeanl_i, the true label i.e. c2, should not be available. Therefore, I believe iterating through all supermeanl0~supermeanl(N-1) is the correct way to do so.

decurtoydiaz commented 2 years ago

Let's look at it.

If you unroll the loop you'll understand that we just compare one by one all representatives (and each representative has a multiplicative factor) against each test instance. And then choose among them as predicted class the comparison that gives minimum RMSE Signature. Here scale factors are computed optimally assuming you can resolve at test time the ambiguity, using for example the same criteria used to derive the weights (which is not explained in the code). De Curtò y DíAz.

Promisery commented 2 years ago

I've figured it out. Thank you for your patience!

nlml commented 2 years ago

Either I'm still not getting this, or I think that the original concern raised by this issue is valid.

Think about it like this, how would we predict a new, unseen image, for which we don't know the label?

We could calculate the signature/features for this image and put it in a variable signature.

Then your test code could look like this:

rmse_c = np.empty(categories, dtype='object')
for c in range(0,categories):
  rmse_c[c] = mean_squared_error(
    globals()['supermeanl_' + str(c2)] * supermeanA[c],
    signature,
    squared=False
  )
  min_rmse = np.argmin(rmse_c)  # this is our predicted class

However, we don't have variable c2 here? Since we do not know the label.

We could loop over c2 as you do in your code, but that would just cause our prediction to be overwritten 10 times, with a different result each time. Maybe we are supposed to average over these predictions for all c2?

To clarify further, I think it would be useful if you could write a code snippet that would show how to make a prediction on an unlabeled image. Until then, it appears to me you are indeed using the test label in making your prediction.

bicsi commented 2 years ago

I agree that the original issue should be re-opened. @Promisery could you please reason how you changed your mind? To me it seems that your concern is valid. @decurtoydiaz could you please detail why the multiplicative factors are constant (and, by extension, what do you mean by 'constant')?

nslay commented 2 years ago

I reimplemented this with CIFAR10 data directly downloaded from toronto.edu website. I got the same results. That's spooky! If there's data leakage, it's not obvious where. Still thinking about it!

Attached is my self-contained reimplementation (for CIFAR10 only).

I originally built signatures on the first 4 file batches and used data_batch_5.bin for validation. If you do it this different way, there's some class imbalance and then RMSE errors are really really bad across the board. So I changed it to be more like the author's by organizing the images into a class-indexed array of lists of images. My folder[c] is not an array of paths, it's an array of training images with class label c.

The author only considers 10 training images from each class for calculating supermeanA. And only 100 validation examples from each class. Really spooky stuff that it somehow generalizes to 10000 test images! And this is a separate implementation too. Please point out any errors in my code. I'm still scratching my head!

Aside of how you build supermeanA/supermeanl, changing n_signatures can also cause misclassification errors in some class labels. For example, n_signatures = 20 gives.

RMSE 0
# of errors: 0
Accuracy: 1.0
RMSE 1
# of errors: 0
Accuracy: 1.0
RMSE 2
# of errors: 0
Accuracy: 1.0
RMSE 3
# of errors: 0
Accuracy: 1.0
RMSE 4
# of errors: 0
Accuracy: 1.0
RMSE 5
# of errors: 0
Accuracy: 1.0
RMSE 6
# of errors: 0
Accuracy: 1.0
RMSE 7
# of errors: 0
Accuracy: 1.0
RMSE 8
# of errors: 13
Accuracy: 0.987
RMSE 9
# of errors: 0
Accuracy: 1.0

run_iisig.zip

JossWhittle commented 2 years ago

@nlml I came to the same conclusion going through the code. Test samples of class c2 are only ever compared to things multiplied by supermeanl_c2 implying that we already know the test sample is in class c2.

nslay commented 2 years ago

In the test loop, the author has lists of images organized by class label. Instead of iterating over (image, label) pairs, author iterates over images with class label c2. That's completely fine.

So instead of calculating misclassification error like something like below

for image, label in imageLabelPairs:
    for c in range(categories): # Calculate score for each class label
        rmse_c[c] = mean_squared_error(...) 
    rmse_min = np.argmin(rmse_min) # The label with minimum value is the predicted class
    if rmse_min != label: # Does it match ground truth?
        count[label] += 1
   ...

Author has it organized this way

for c2 in range(categories):
    for z in range(len(imagesWithLabel[c2])): # z is index to z'th image with ground truth label c2
        for c in range(categories): # Calculate score for each class label
            rmse_c[c] = mean_squared_error(...) 
        rmse_min = np.argmin(rmse_min) # The label with minimum value is the predicted class
        if rmse_min != c2: # Does it match ground truth?
            count[c2] += 1
   ...

I don't see anything wrong with this.

JossWhittle commented 2 years ago

@nslay Your reimplementation has the same data leakage on line 212.

I think if we were talking about choosing argmin(c) RMSE(supermeanA[c], featuresAA[c2][z]) that makes total sense. You're essentially doing K-means clustering where the K cluster centres are the average over 10 samples from each classes train set, and then you're assigning test samples to the closest cluster by RMSE distance.

As soon as you start comparing the test sample to anything multiplied by something specific to the class it is actually from (that you aren't meant to know yet) then its dependent on leaked information.

nslay commented 2 years ago

@nslay Your reimplementation has the same data leakage on line 212.

I think if we were talking about choosing argmin(c) RMSE(supermeanA[c], featuresAA[c2][z]) that makes total sense. You're essentially doing K-means clustering where the K cluster centres are the average over 10 samples from each classes train set, and then you're assigning test samples to the closest cluster by RMSE distance.

As soon as you start comparing the test sample to anything multiplied by something specific to the class it is actually from (that you aren't meant to know yet) then its dependent on leaked information.

You're absolutely right.

Oh, I think I see yours and @nlml point

            rmse_c[c] = mean_squared_error(supermeanl[c2] * supermeanA[c], featuresAA[c2][z], squared=False)

supermeanl[c2] and featuresAA[c2] is the leakage. If done with with an image/label pair iteration, it becomes more obvious

for image, label in imageLabelPairs:
...
            rmse_c[c] = mean_squared_error(supermeanl[label] * supermeanA[c], featuresAA[label][z], squared=False)

Then it's more obvious (to me anyway):

decurtoydiaz commented 2 years ago

Weights, that is, optimal scale factors are computed according to Definition 4. What we show here is that we can achieve perfect score and no overfitting given that you can choose the right scale factors in validation and that you can resolve the ambiguity of which one to use in test. Code in the repository is very preliminary and paper still not accepted, will release more code soon. Thanks for the comments. De Curtò y DíAz.

nslay commented 2 years ago

This seems to be something like what was intended. Check all lambdas over categories in testing since you don't know the ground truth category.

for z in range(xtest.shape[0]): # Over all test examples!
    label = int(ytest[z])
    image = xtest[z, ...].astype(np.uint8)
    image = image.transpose(1,2,0)
    #image = image.transpose(2,1,0)
    image  = np.reshape(image, (image.shape[0], image.shape[1] * image.shape[2]))
    image = iisignature.sig(image, N_truncated)

    rmse_c = np.empty((categories, categories), dtype='object')
    for c2 in range(categories): # Scan over supermeanl
        for c in range(categories): # Scan over categories
            rmse_c[c2,c] = mean_squared_error(supermeanl[c2] * supermeanA[c], image, squared=False)

    rmse_c = rmse_c.min(axis=0) # Consider the minimum rmse_c over c2

    min_rmse = np.argmin(rmse_c) # Then calculate the predicted class label
    if min_rmse != label:
        count[label] += 1

The performance is not good anymore. That's depressing... it would be really cool if author really had got perfect test performance!

RMSE 0
# of errors: 930
Accuracy: 0.06999999999999995
RMSE 1
# of errors: 832
Accuracy: 0.16800000000000004
RMSE 2
# of errors: 872
Accuracy: 0.128
RMSE 3
# of errors: 963
Accuracy: 0.03700000000000003
RMSE 4
# of errors: 926
Accuracy: 0.07399999999999995
RMSE 5
# of errors: 866
Accuracy: 0.134
RMSE 6
# of errors: 412
Accuracy: 0.5880000000000001
RMSE 7
# of errors: 921
Accuracy: 0.07899999999999996
RMSE 8
# of errors: 782
Accuracy: 0.21799999999999997
RMSE 9
# of errors: 781
Accuracy: 0.21899999999999997

I'm sorry author. Clever data organization and coding can get the best of us. Reddit is/was also confused.

JossWhittle commented 2 years ago

Weights, that is, optimal scale factors are computed according to Definition 4. What we show here is that we can achieve perfect score and no overfitting given that you can choose the right scale factors in validation. You are able to determine adequate lambda in test if you use same criteria. Code in the repository is very preliminary and paper still not accepted, will release more code soon. Thanks for the comments. De Curtò.

It's not really validation if you are updating your weights based on it. It's just splitting your train set for each class into two groups to use for different parts of your fitting process.

What we show here is that we can achieve perfect score and no overfitting given that you can choose the right scale factors in validation.

The issue is exactly this. For test samples from class c2 you compare them to the supermeanA[c] scaled by only the scale factor supermeanl[c2] for class c2. You are already stating the test sample is in c2 when you do this. Information has been leaked.

If you pooled over supermeanl[c2] * supermeanA[c] for all combinations of c, c2 in cartessian_product(C, C) and reduced that down then that can be fair.

If you considered only supermeanA[c] in the comparison, that would be fair, its close to K-means where the iterated-integral signature is a feature extractor on the raw images.

But to only compare to things polluted by knowledge of the true class is not fair.


Here is a unique hand drawn 28x28 pixel image of a character that I have just made.

digit

Please provide a minimal code cell (that would work dropped directly into the end of your notebook after running it entirely as you have provided it for MNIST) that would perform inference on this single image and tell us which class it belongs to.

decurtoydiaz commented 2 years ago

Thanks for the comments. In this particular example, we assume we can correctly resolve the ambiguity at test time of which probably good optimal lambda to use, for instance using the same criteria we used to derive the weights. And that if you do so, there is no overfitting and we get 100% accuracy on all tasks. That is, for example in AFHQ, we can find analytically the n-dimensional lambda that correctly classifies the samples at test time: the only ambiguity here is being able to resolve which one of the 3 n-dimensional scale factors to use (which is not explained in the code). De Curtò y DíAz.

decurtoydiaz commented 2 years ago

This seems to be something like what was intended. Check all lambdas over categories in testing since you don't know the ground truth category.

for z in range(xtest.shape[0]): # Over all test examples!
    label = int(ytest[z])
    image = xtest[z, ...].astype(np.uint8)
    image = image.transpose(1,2,0)
    #image = image.transpose(2,1,0)
    image  = np.reshape(image, (image.shape[0], image.shape[1] * image.shape[2]))
    image = iisignature.sig(image, N_truncated)

    rmse_c = np.empty((categories, categories), dtype='object')
    for c2 in range(categories): # Scan over supermeanl
        for c in range(categories): # Scan over categories
            rmse_c[c2,c] = mean_squared_error(supermeanl[c2] * supermeanA[c], image, squared=False)

    rmse_c = rmse_c.min(axis=0) # Consider the minimum rmse_c over c2

    min_rmse = np.argmin(rmse_c) # Then calculate the predicted class label
    if min_rmse != label:
        count[label] += 1

The performance is not good anymore. That's depressing... it would be really cool if author really had got perfect test performance!

RMSE 0
# of errors: 930
Accuracy: 0.06999999999999995
RMSE 1
# of errors: 832
Accuracy: 0.16800000000000004
RMSE 2
# of errors: 872
Accuracy: 0.128
RMSE 3
# of errors: 963
Accuracy: 0.03700000000000003
RMSE 4
# of errors: 926
Accuracy: 0.07399999999999995
RMSE 5
# of errors: 866
Accuracy: 0.134
RMSE 6
# of errors: 412
Accuracy: 0.5880000000000001
RMSE 7
# of errors: 921
Accuracy: 0.07899999999999996
RMSE 8
# of errors: 782
Accuracy: 0.21799999999999997
RMSE 9
# of errors: 781
Accuracy: 0.21899999999999997

I'm sorry author. Clever data organization and coding can get the best of us. Reddit is/was also confused.

Scale factors are computed optimally assuming you can resolve at test time the ambiguity, using for example the same criteria used to derive the weights. So, lambdas shouldn't be changed, you have found probably good optimal solutions in validation. The only thing that is not explained here is how you choose among those optimal lambdas at test time. De Curtò y DíAz.

decurtoydiaz commented 2 years ago

Weights, that is, optimal scale factors are computed according to Definition 4. What we show here is that we can achieve perfect score and no overfitting given that you can choose the right scale factors in validation. You are able to determine adequate lambda in test if you use same criteria. Code in the repository is very preliminary and paper still not accepted, will release more code soon. Thanks for the comments. De Curtò.

It's not really validation if you are updating your weights based on it. It's just splitting your train set for each class into two groups to use for different parts of your fitting process.

What we show here is that we can achieve perfect score and no overfitting given that you can choose the right scale factors in validation.

The issue is exactly this. For test samples from class c2 you compare them to the supermeanA[c] scaled by only the scale factor supermeanl[c2] for class c2. You are already stating the test sample is in c2 when you do this. Information has been leaked.

If you pooled over supermeanl[c2] * supermeanA[c] for all combinations of c, c2 in cartessian_product(C, C) and reduced that down then that can be fair.

If you considered only supermeanA[c] in the comparison, that would be fair, its close to K-means where the iterated-integral signature is a feature extractor on the raw images.

But to only compare to things polluted by knowledge of the true class is not fair.

Here is a unique hand drawn 28x28 pixel image of a character that I have just made.

digit

Please provide a minimal code cell (that would work dropped directly into the end of your notebook after running it entirely as you have provided it for MNIST) that would perform inference on this single image and tell us which class it belongs to.

Again, scale factors are computed optimally assuming you can resolve at test time the ambiguity, using for example the same criteria used to derive the weights. So, lambdas shouldn't be changed, you have found probably good optimal solutions in validation. The only thing that is not explained here is how you choose among those optimal lambdas at test time. De Curtò y DíAz.

decurtoydiaz commented 2 years ago

I agree that the original issue should be re-opened. @Promisery could you please reason how you changed your mind? To me it seems that your concern is valid. @decurtoydiaz could you please detail why the multiplicative factors are constant (and, by extension, what do you mean by 'constant')?

Here scale factors are computed optimally assuming you can resolve at test time the ambiguity, using for example the same criteria used to derive the weights (which is not explained in the code). De Curtò y DíAz.

decurtoydiaz commented 2 years ago

Please be considerate and respectful in your discourse.

The code is only a generalisation of our previous work: https://github.com/decurtoydiaz/signatures Please check it out.

Here we show that there exist some optimal weights tuned on validation that can be used to classify without overfitting in test if you are able to resolve the ambiguity of which one to use in test. The ambiguity is a geometric constraint, so it can be resolved (using several ways). But you can forget about that definition and try to find yourself the weights using some other method. They can also be constant factors, or be found using grid search, bayesian analysis or k-fold crossvalidation. The method is indeed general.

De Curtò y DíAz.

decurtoydiaz commented 2 years ago

Please check the new updated code. The ambiguity in test is determined using one-vs-all fixing the proper lambda. You get n-classifiers one per class. Accuracy 100%.

De Curtò y DíAz.

JossWhittle commented 2 years ago

Please check the new updated code. The ambiguity in test is determined using one-vs-all fixing the proper lambda. You get n-classifiers one per class. Accuracy 100%.

De Curtò y DíAz.

You have added three new code cells for evaluating cat, dog, and wild from AFHQ. In each of the three cells you have unrolled the c2 loop and hard coded what was globals()['supermeanl_' + str(c2)] to be supermeanl_0, supermeanl_1, and supermeanl_2.

In each of the three cases they are 100% accurate for class c2 and 0% accurate for the other two classes, because your model always guesses whatever is hardcoded for supermeanl_ of c2.

I'm sorry but this does not work. You cannot perform inference on a sample you don't already have a label for.

To quote previous request:

Here is a unique hand drawn 28x28 pixel image of a character that I have just made.

digit

Please provide a minimal code cell (that would work dropped directly into the end of your notebook after running it entirely as you have provided it for MNIST) that would perform inference on this single image and tell us which class it belongs to.

To summarize in this thread what you changed in your code:


100% on all classes by leaking c2 (test label) into the predictions.

count = np.zeros(categories, dtype='object')

for c2 in range(0,categories):
  a = os.listdir(folder[c2])
  for z in range(0,len(a)):
    rmse_c = np.empty(categories, dtype='object')
    for c in range(0,categories):
      rmse_c[c] = mean_squared_error(globals()['supermeanl_' + str(c2)] * supermeanA[c], signature_cyz(folder[c2], a[z]), squared=False)
    min_rmse = np.argmin(rmse_c)
    if(min_rmse != c2): 
      count[c2] += 1

---

RMSE cat
# of errors: 0
Accuracy: 1.0

RMSE dog
# of errors: 0
Accuracy: 1.0

RMSE wild
# of errors: 0
Accuracy: 1.0

100% on cat class by hardcoding the prediction to be cat, 0% acc on dog and wild because the model just says cat for everything.

count = np.zeros(categories, dtype='object')

for c2 in range(0,categories):
  a = os.listdir(folder[c2])
  for z in range(0,len(a)):
    rmse_c = np.empty(categories, dtype='object')
    for c in range(0,categories):
      rmse_c[c] = mean_squared_error(supermeanl_0 * supermeanA[c], signature_cyz(folder[c2], a[z]), squared=False)
    min_rmse = np.argmin(rmse_c)
    if(min_rmse != c2): 
      count[c2] += 1

---

RMSE cat
# of errors: 0
Accuracy: 1.0

RMSE dog
# of errors: 500
Accuracy: 0.0

RMSE wild
# of errors: 500
Accuracy: 0.0

100% on dog class by hardcoding the prediction to be dog, 0% acc on cat and wild because the model just says dog for everything.

count = np.zeros(categories, dtype='object')

for c2 in range(0,categories):
  a = os.listdir(folder[c2])
  for z in range(0,len(a)):
    rmse_c = np.empty(categories, dtype='object')
    for c in range(0,categories):
      rmse_c[c] = mean_squared_error(supermeanl_1 * supermeanA[c], signature_cyz(folder[c2], a[z]), squared=False)
    min_rmse = np.argmin(rmse_c)
    if(min_rmse != c2): 
      count[c2] += 1

---

RMSE cat
# of errors: 500
Accuracy: 0.0

RMSE dog
# of errors: 0
Accuracy: 1.0

RMSE wild
# of errors: 500
Accuracy: 0.0

100% on wild class by hardcoding the prediction to be wild, 0% acc on cat and dog because the model just says wild for everything.

count = np.zeros(categories, dtype='object')

for c2 in range(0,categories):
  a = os.listdir(folder[c2])
  for z in range(0,len(a)):
    rmse_c = np.empty(categories, dtype='object')
    for c in range(0,categories):
      rmse_c[c] = mean_squared_error(supermeanl_2 * supermeanA[c], signature_cyz(folder[c2], a[z]), squared=False)
    min_rmse = np.argmin(rmse_c)
    if(min_rmse != c2): 
      count[c2] += 1

---

RMSE cat
# of errors: 500
Accuracy: 0.0

RMSE dog
# of errors: 500
Accuracy: 0.0

RMSE wild
# of errors: 0
Accuracy: 1.0
decurtoydiaz commented 2 years ago

All of them. You should use all of them. One vs all. Look at wikipedia hell. Use all the classifiers. This was the de facto approach before Deep Learning. Please revise your notes on data science.

De Curtò y DíAz.

JossWhittle commented 2 years ago

If you know which of the three classifiers to use, then you already know what the class label is.

If you don't know which of the three to use, then you have an ensemble of three models all disagreeing with one another equally.

decurtoydiaz commented 2 years ago

If you know which of the three classifiers to use, then you already know what the class label is.

If you don't know which of the three to use, then you have an ensemble of three models all disagreeing with one another equally.

Again, all of them. You should use all of them. One vs all. Look at wikipedia hell. Use all the classifiers. This was the de facto approach before Deep Learning. Please revise your notes on data science.

You have a perfect classifier of cats, another of dogs and another of wild. It's binary. Think it like that. When you try your wild on cat will say no, and no on dogs and yes on wild. This was traditional vision approach before Deep Learning and commonly used in robotics.

De Curtò y DíAz.

JossWhittle commented 2 years ago

When you try your wild on cat will say no, and no on dogs and yes on wild.

If you try the wild model on cat your model will say wild because it is hardcoded to say wild.

If you try the wild model on dog your model will say wild because it is hardcoded to say wild.

If you try the wild model on wild your model will say wild because it is hardcoded to say wild.

decurtoydiaz commented 2 years ago

Please try the code, for god's sake. You have three classifiers each one with a fixed lambda. If you look at the code, we go through ALL the test data, and it gets only the corresponding class classified.

De Curtò y DíAz.

JossWhittle commented 2 years ago

Here is a unique hand drawn 28x28 pixel image of a character that I have just made.

digit

Please provide a minimal code cell (that would work dropped directly into the end of your notebook after running it entirely as you have provided it for MNIST) that would perform inference on this single image and tell us which class it belongs to.

I have tried your code. That's why I and the others here are sure you have leaked information from the test labels.

If you don't need access to the test labels to make a prediction, then you will be able to perform inference on this image and classify it.

If you do need access to a test label for this image in order to be able to classify it, then you have to concede that you have leaked information from the test set labels when you computed your accuracy scores.

decurtoydiaz commented 2 years ago

Please check the updated example. You DON'T need any information from the labels. You try all classifiers on the given input. There is no leakeage. One vs all. Please revise your notes on data science.

De Curtò y DíAz.

rasbt commented 2 years ago

You DON'T need any information from the labels. There is no leakeage.

How do you explain the fact that if you swap two class labels in the test set (e.g., rename 1 to 6, and 6 to 1) you still get 100% test accuracy? Unless you swap them in the training set too, this should be impossible unless there is a dataset leakage.

decurtoydiaz commented 2 years ago

Hell, go back to your first course in programming. I'm not here no answer those questions. It's late in Hong Kong. Renaming one variable to another doesn't change anything. Computers don't understand about variable names. Please, think before doing a question.

All of them. You should use all of them. One vs all. Look at wikipedia hell. Use all the classifiers. This was the de facto approach before Deep Learning. Please revise your notes on data science.

You have a perfect classifier of cats, another of dogs and another of wild. It's binary. Think it like that. When you try your wild on cat will say no, and no on dogs and yes on wild. This was traditional vision approach before Deep Learning and commonly used in robotics.

De Curtò y DíAz.

decurtoydiaz commented 2 years ago

Please be considerate and respectful in your discourse.

Code is on the repository. You can try it yourself. When you do one vs all, you fix lambda and go through all the test data. And you get only correct the instances from the corresponding classes. For example, in AFHQ this means you have a perfect classifier of cats, dogs and wild. And as in traditional vision and robotics, when you do one vs all you have to use all classifiers if you are given an unlabeled test instance. There is no leakage. 100% on all tasks. Code is correct. Learning with Signatures rocks.

De Curtò y DíAz.

-- https://www.decurto.tw/

GillesVandewiele commented 2 years ago

Oh wow I forgot to look at this thread, seems like we are actually just repeating ourselves. Linking #3 to keep it efficient.

monney commented 2 years ago

Seems like all the bases have been really covered here. But, I feel like it's worth noting that there are well known errors in all of these test sets, so we don't expect 100% (https://arxiv.org/pdf/2103.14749.pdf). Unless a method is outlined for choosing the scale factor a priori per image, the model is not useful (as others have stated). At the very least, if the idea is that one, in principle, could determine which scale factor to use, without labels, the accuracy of the current best way of doing so should be reported, not the theoretical accuracy "if one could perfectly find the scale factor". That problem appears no easier than the original classification problem to me though.

decurtoydiaz commented 2 years ago

Seems like all the bases have been really covered here. But, I feel like it's worth noting that there are well known errors in all of these test sets, so we don't expect 100% (https://arxiv.org/pdf/2103.14749.pdf). Unless a method is outlined for choosing the scale factor a priori per image, the model is not useful (as others have stated). At the very least, if the idea is that one, in principle, could determine the optimal scale factor without labels, the accuracy of the current best way of doing so should be reported, not the theoretical accuracy "if one could perfectly find the scale factor". That problem appears no easier than the original classification problem to me though.

Code is correct. Issue is already solved and thoroughly discussed in https://github.com/decurtoydiaz/learning_with_signatures/issues/3 .

Again, One vs all. You should use all classifiers each with a fixed lambda. There is no leakage. Ambiguity is resolved in test time as explained in the notebook by doing one-vs-all, which indeed was the de facto way to do things in many domains such as Robotics before Deep Learning emerged. Please revise your notes on data science.

When you do one vs all, you fix lambda and go through ALL the test data. And you get only correct the instances from the corresponding classes. For example, in AFHQ this means you have a perfect classifier of cats, dogs and wild. And as in traditional vision and robotics, when you do one vs all you have to use all classifiers if you are given an unlabeled test instance. There is no leakage. 100% on all tasks.

What's more, weights (videlicet, optimal scale factors) are tuned on VALIDATION (indeed, with very few samples; check the code; it's the range between begin_validate and end_validate; 100 or 500, depending on the task) and then achieve perfect generalisation on the test set. The most beautiful example of this is with Four Shapes, where only using 10 train samples (4 classes, 40 in total) to compute the representatives, and 100 validation samples (4 classes, 400 in total) to tune optimal scale factors, we achieve perfect accuracy on around 14,000 samples. This dataset is also particularly interesting because it is a good test for the properties of the signature transform, that capture area and order of the input paths.

Please, no more active participation in this thread is allowed.

De Curtò y DíAz.

-- https://www.decurto.tw/