decurtoydiaz / learning_with_signatures

Learning with Signatures
56 stars 10 forks source link

Questions about the results? #4

Closed 781458112 closed 2 years ago

781458112 commented 2 years ago

When I changed the location of the image, I put part of the image into a folder and output only the predicted results of the algorithm. I found a phenomenon that I only took data for three CIFAR10 tags, but the prediction results did range from 1 to 10. If so, how can I be sure that entering a random image will give it the right label? In addition, attached is my modified code:

count = np.zeros(categories, dtype='object') for c2 in range(0,categories): a = os.listdir('CIFAR-10-dataset-main/val') for z in range(0,len(a)): rmse_c = np.empty(categories, dtype='object') for c in range(0,categories): rmse_c[c] = mean_squarederror(globals()['supermeanl' + str(c2)] * supermeanA[c], my_signature_cyz(a[z]), squared=False) min_rmse = np.argmin(rmse_c)

print('jpg',a[z])

print('yuce_label', labels[min_rmse])

print('min_rmse',min_rmse)

decurtoydiaz commented 2 years ago

Labels are encoded into image folders. Please check the code and adapt your data to fit it that way. You need to specify both training, validation and testing. Training and validation are normally on same folder except in Four Shapes, validation being on range begin_validate and end_validate. Test should be in another folder. Representatives are computed using training and weights tuned in validation. Test images are used to compute RMSE Signature. In all folders there should be a folder with every class, and you need also to specify in variable labels the name of the given classes.

De Curtò y DíAz.

suhao16 commented 2 years ago

@decurtoydiaz In the test set, you process the labels of each class, and this prior knowledge causes information leakage in the test set when others have already classified them.

decurtoydiaz commented 2 years ago

Again, One vs all. You should use all classifiers, each with a fix lambda. This issue has already been solved in #3.

When you do one vs all, you fix lambda and go through ALL the test data. And you get only correct the instances from the corresponding classes. For example, in AFHQ this means you have a perfect classifier of cats, dogs and wild. And as in traditional vision and robotics, when you do one vs all you have to use all classifiers if you are given an unlabeled test instance. There is no leakage. 100% on all tasks. Code is correct. Please revise your notes on data science.

De Curtò y DíAz.

-- https://www.decurto.tw/

decurtoydiaz commented 2 years ago

Code is correct. Issue is already solved and thoroughly discussed in #1 .

Again, One vs all. You should use all classifiers each with a fixed lambda. There is no leakage. Ambiguity is resolved in test time as explained in the notebook by doing one-vs-all, which indeed was the de facto way to do things in many domains such as Robotics before Deep Learning emerged. Please revise your notes on data science.

When you do one vs all, you fix lambda and go through ALL the test data. And you get only correct the instances from the corresponding classes. For example, in AFHQ this means you have a perfect classifier of cats, dogs and wild. And as in traditional vision and robotics, when you do one vs all you have to use all classifiers if you are given an unlabeled test instance. There is no leakage. 100% on all tasks.

What's more, weights (videlicet, optimal scale factors) are tuned on VALIDATION (indeed, with very few samples; check the code; it's the range between begin_validate and end_validate; 100 or 500, depending on the task) and then achieve perfect generalisation on the test set. The most beautiful example of this is with Four Shapes, where only using 10 train samples (4 classes, 40 in total) to compute the representatives, and 100 validation samples (4 classes, 400 in total) to tune optimal scale factors, we achieve perfect accuracy on around 14,000 samples. This dataset is also particularly interesting because it is a good test for the properties of the signature transform, that capture area and order of the input paths.

Please, no more active participation in this thread is allowed.

De Curtò y DíAz.

-- https://www.decurto.tw/

781458112 commented 2 years ago

@decurtoydiaz So how do I get the tag value of the sample without knowing the tag value of the sample, I think it's written in the program that I know the tag value of the sample by default and use it. What I wrote in the program above is why the output is still from 0 to 9 when I input the sample image with no tags, but in fact I only took the samples of the first three tags and no tags at all.

------------------ 原始邮件 ------------------ 发件人: "decurtoydiaz/learning_with_signatures" @.>; 发送时间: 2022年4月23日(星期六) 下午3:48 @.>; @.**@.>; 主题: Re: [decurtoydiaz/learning_with_signatures] Questions about the results? (Issue #4)

Code is correct. Issue is already solved and thoroughly discussed in #1 .

Again, One vs all. You should use all classifiers each with a fixed lambda. There is no leakage. Ambiguity is resolved in test time as explained in the notebook by doing one-vs-all, which indeed was the de facto way to do things in many domains such as Robotics before Deep Learning emerged. Please revise your notes on data science.

When you do one vs all, you fix lambda and go through ALL the test data. And you get only correct the instances from the corresponding classes. For example, in AFHQ this means you have a perfect classifier of cats, dogs and wild. And as in traditional vision and robotics, when you do one vs all you have to use all classifiers if you are given an unlabeled test instance. There is no leakage. 100% on all tasks.

What's more, weights (videlicet, optimal scale factors) are tuned on VALIDATION (indeed, with very few samples; check the code; it's the range between begin_validate and end_validate; 100 or 500, depending on the task) and then achieve perfect generalisation on the test set. The most beautiful example of this is with Four Shapes, where only using 10 train samples (4 classes, 40 in total) to compute the representatives, and 100 validation samples (4 classes, 400 in total) to tune optimal scale factors, we achieve perfect accuracy on around 14,000 samples. This dataset is also particularly interesting because it is a good test for the properties of the signature transform, that capture area and order of the input paths.

Please, no more active participation in this thread is allowed.

De Curtò y DíAz.

-- https://www.decurto.tw/

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

decurtoydiaz commented 2 years ago

Please take a look at the code. You can use the original data configuration used in the experiments by creating a shortcut in your drive to the folder with the datasets in my drive (provided in the notebook:https://drive.google.com/drive/folders/1jjG5xc0Sj2WoyBM81issdc58zNxNHrNg?usp=sharing). Go step by step and read the comments. You can reproduce the experiments. Labels are encoded in the folders with one folder per each class. Lambas are tuned on validation, and then fixed on test in a one-vs-all fashion (you get n-binary classifiers, one per class. In the event of a new unseen sample, you should use all of them. And you'll get something like: [no, no, yes] for a ground truth wild sample, for example).

De Curtò y DíAz.

-- https://www.decurto.tw/

suhao16 commented 2 years ago

@decurtoydiaz "Labels are encoded in the folders with one folder per each class" ——————Can't do this on the test set

decurtoydiaz commented 2 years ago

@decurtoydiaz "Labels are encoded in the folders with one folder per each class" ——————Can't do this on the test set

Please take a look at the code. You can use the original data configuration used in the experiments by creating a shortcut in your drive to the folder with the datasets in my drive (provided in the notebook:https://drive.google.com/drive/folders/1jjG5xc0Sj2WoyBM81issdc58zNxNHrNg?usp=sharing). Go step by step and read the comments. You can reproduce the experiments. Labels are encoded in the folders with one folder per each class. Lambas are tuned on validation, and then fixed on test in a one-vs-all fashion (you get n-binary classifiers, one per class. In the event of a new unseen sample, you should use all of them. And you'll get something like: [no, no, yes] for a ground truth wild sample, for example).

In the test set, we assume we can determine the correct lambda, not the label. When you do one vs all, lambda is fixed and we go through all the test data. Then you have n-binary classifiers each with a fixed lambda. There is no leakage, and no assumption of the given label. Check the code.

Please, no more active participation in this thread is allowed as all this has been extensively discussed in #1 and #3.

De Curtò y DíAz.

-- https://www.decurto.tw/

decurtoydiaz commented 2 years ago

@decurtoydiaz In the test set, you process the labels of each class, and this prior knowledge causes information leakage in the test set when others have already classified them.

Please take a look at the code. You can use the original data configuration used in the experiments by creating a shortcut in your drive to the folder with the datasets in my drive (provided in the notebook:https://drive.google.com/drive/folders/1jjG5xc0Sj2WoyBM81issdc58zNxNHrNg?usp=sharing). Go step by step and read the comments. You can reproduce the experiments. Labels are encoded in the folders with one folder per each class. Lambas are tuned on validation, and then fixed on test in a one-vs-all fashion (you get n-binary classifiers, one per class. In the event of a new unseen sample, you should use all of them. And you'll get something like: [no, no, yes] for a ground truth wild sample, for example).

In the test set, we assume we can determine the correct lambda, not the label. When you do one vs all, lambda is fixed and we go through all the test data. Then you have n-binary classifiers each with a fixed lambda. There is no leakage, and no assumption of the given label. Check the updated code.

Please, no more active participation in this thread is allowed as all this has been extensively discussed in https://github.com/decurtoydiaz/learning_with_signatures/issues/1 and https://github.com/decurtoydiaz/learning_with_signatures/issues/3.

De Curtò y DíAz.

-- https://www.decurto.tw/