Closed qiminchen closed 3 years ago
hey @beijbom, I added the MLP classifier by adding one more argument clf_type: str # Name of the classifier to use
, tested on both local and docker, passed most test cases except this one. I don't think this failure is caused by adding the MLP. Do you have any ideas?
======================================================================
FAIL: test_img_classify_bad_url (spacer.tests.test_mailman.TestProcessJobErrorHandling)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/workspace/spacer/spacer/tests/test_mailman.py", line 62, in test_img_classify_bad_url
self.assertTrue('URLError' in return_msg.error_message)
AssertionError: False is not true
I set the default clf_type='MLP
and ran the
python scripts/regression/efficientnet_extractor.py efficientnet_b0_ver1 294 10 MLP /path/to/features
to test both EfficientNet-b0 feature extraction and MLP classifier training, it worked pretty well. You could also test on your local. Changes are ready for review.
Hey. Looks like some other tests fail on Travis: https://travis-ci.org/github/beijbom/pyspacer/jobs/725454742
Re. the error. I'm not sure. Can you print the error message (return_msg.error_message
)? Perhaps the formatting changed slightly so it doesn't match what I check against?
Re. the error. I'm not sure. Can you print the error message (
return_msg.error_message
)? Perhaps the formatting changed slightly so it doesn't match what I check against?
I know what the error is, working on it
Btw. I updated CI to use travis-ci.com. CI callbacks seem to be back up again.
@beijbom It should work well now
@qiminchen : I took a look -- this looks nice in general. I ran
% python scripts/regression/train_classifier.py train 1498 /data/tmp
-> Downloading data for source id: 1498.
-> Downloading 804 metadata and feature files...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 804/804 [03:08<00:00, 4.26it/s]
-> Assembling train and val data for source id: 1498
-> Training...
-> Re-trained SpatSurvey (1498). Old acc: 76.0, new acc: 72.5
which gave lower performance than the previous setup. Do you mind running a few sources and pasting the results here? I just like to confirm that this is an outlier.
Also: it seems we should be able to clean up the regression
folder a bit. For example, there are two private methods for caching local, one in efficientnet_extractor.py
and one in train_classifier.py
. I know they do slightly different things, but can you see if it can be moved to a shared utils.py
?
@qiminchen : I took a look -- this looks nice in general. I ran
% python scripts/regression/train_classifier.py train 1498 /data/tmp -> Downloading data for source id: 1498. -> Downloading 804 metadata and feature files... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 804/804 [03:08<00:00, 4.26it/s] -> Assembling train and val data for source id: 1498 -> Training... -> Re-trained SpatSurvey (1498). Old acc: 76.0, new acc: 72.5
which gave lower performance than the previous setup. Do you mind running a few sources and pasting the results here? I just like to confirm that this is an outlier.
@beijbom I think I found the problem. So train_classifier.py
downloads the data from spacer-trainingdata
bucket where the features were extracted by the VGG16CaffeExtractor
. I then trained the features with LR
and MLP
respectively, and here is the comparison.
LR
-> Downloading data for source id: 1498.
-> Downloading 804 metadata and feature files...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 804/804 [02:12<00:00, 6.09it/s]
-> Assembling train and val data for source id: 1498
-> Training...
-> Re-trained SpatSurvey (1498). Old acc: 76.0, new acc: 71.5
MLP
-> Downloading data for source id: 1498.
-> Downloading 804 metadata and feature files...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 804/804 [00:00<00:00, 195922.64it/s]
-> Assembling train and val data for source id: 1498
-> Training...
-> Re-trained SpatSurvey (1498). Old acc: 76.0, new acc: 72.6
So I think it's because the current backend s1498 classifier (with 76.0% acc) was trained on 169 images at 06/22/2019 while the one you test was trained on 268 images, which means that the extra images uploaded by the user didn't help improve the performance, otherwise the training classifier status should be on the chart.
While efficientnet_extractor.py
extracts the new features using EfficientNetExtractor
and trains the classifier using either LR
or MLP
, here are the training results using LR
and MLP
.
LR
python scripts/regression/efficientnet_extractor.py efficientnet_b0_ver1 1498 10 LR /data/tmp
-> Fetching 1498 image and annotation meta files...
-> Extracting features...
-> Downloading data for source id: 1498.
-> Downloading 268 metadata...
-> Assembling train and val data for source id: 1498
-> Training...
-> Re-trained SpatSurvey (1498). Old acc: 76.0, new acc: 78.9
MLP
python scripts/regression/efficientnet_extractor.py efficientnet_b0_ver1 1498 10 MLP /data/tmp
-> Fetching 1498 image and annotation meta files...
-> Extracting features...
-> Downloading data for source id: 1498.
-> Downloading 268 metadata...
-> Assembling train and val data for source id: 1498
-> Training...
-> Re-trained SpatSurvey (1498). Old acc: 76.0, new acc: 79.3
I will test on a few more sources that are not presented in the 26 test set and paste the results here.
Also: it seems we should be able to clean up the
regression
folder a bit. For example, there are two private methods for caching local, one inefficientnet_extractor.py
and one intrain_classifier.py
. I know they do slightly different things, but can you see if it can be moved to a sharedutils.py
?
Good idea, working on it.
Thanks for looking into that @qiminchen . I agree this is probably b/c different number of imgs in the train data. The results on new features + MLP looks nice!
@qiminchen : is this ready for final review? I just added a minor comment -- let's try to get this merged this weekend.
@qiminchen : is this ready for final review? I just added a minor comment -- let's try to get this merged this weekend.
yes, other than the one you just added, lets get this one done and it's ready for review
Both
python train_classifier.py train 294 /data/tmp --clf_type LR
python train_classifier.py train 294 /data/tmp --clf_type MLP
fails with some error. @qiminchen : do you see the same error?
Both
python train_classifier.py train 294 /data/tmp --clf_type LR python train_classifier.py train 294 /data/tmp --clf_type MLP
fails with some error. @qiminchen : do you see the same error?
hmm this is weird, i did not see the error.. whats the error on your side?
@qiminchen : I think my error was to a feature file was only partway downloaded. I cleared the cache and it works now. I took a final pass to merge the two regression scripts. I think it's a bit cleaner now -- lmk what you think. I'm running some final tests -- once complete this is good to merge.
One question: what is the recommended default number of epochs to train the MLP in your experiments?
@qiminchen : I think my error was to a feature file was only partway downloaded. I cleared the cache and it works now. I took a final pass to merge the two regression scripts. I think it's a bit cleaner now -- lmk what you think. I'm running some final tests -- once complete this is good to merge.
the cleanup looks great!
One question: what is the recommended default number of epochs to train the MLP in your experiments?
10, the reason I use 10 is that the reference code of training the classifier you sent me a long time ago uses 10
One question: what is the recommended default number of epochs to train the MLP in your experiments?
10, the reason I use 10 is that the reference code of training the classifier you sent me a long time ago uses 10
Cool. That checks out.
One question: what is the recommended default number of epochs to train the MLP in your experiments?
10, the reason I use 10 is that the reference code of training the classifier you sent me a long time ago uses 10
Cool. That checks out.
hmm weird, I can't find the code you sent me before or the link to the code, but I'm pretty sure it was 10 epoch. 5 epoch should works as well.
Upload new EfficientNet-bo well-trained weights