Open rangeardt opened 7 years ago
first i think i found a misstake in the code : File : classify.py line 128
mask = result == y
correct = np.count_nonzero(mask)
accuracy = (correct * 100.0 / result.size)
np.count_nonzero() expect an array of element see doc
so we expect the mask to be like [True,False,False,...,True]
because we compare the "result" table of the predict_all and the table "y" containing ids of the real classification extract from the dataset
so we expect to compare something like this :
y = [[1],[2],[1],[1],[2]]
so 5 image, 3 from the class id 1, and 2 from the class id 2
result = [[1],[2],[1],[2],[2]]
result of the predict all : 3 from the 2 and 2 from the 1
the mask should be[True, True, True, False, True]
but with mask = result == y
we just compare if the table1==table2 so its false because y[3] != result[3]
so input parameter for the count_nonzero
function is wrong : we got False
and not a table so correct = 0
and accuracy=0
secondly for the reshape, we use
classes = dataset.get_classes()
log.classes(classes)
log.classes_counts(dataset.get_classes_counts())
result_filename = filenames.result(k, des_name, svm_kernel)
test_count = len(dataset.get_test_set()[0])
result_matrix = np.reshape(result, (len(classes), test_count))
so we try to reshape the result array from the test to a matrix [number of class][number of element in the first class] in my case i don't have the same number of picture on each class/test folder so i got a reshape error, if you have the same issue i recommend testing with the same number of element in each test folder, i am currently testing it
if you want the same number of picture in each folder you can launch this script :
this script assume that you classes are named with only alphabetic characters [a-zA-Z]
you can modify the grep -P
option if its not i did it to avoid an error message on the .
and ..
folders
this script count the number of files on each class/test and take the minimal, then delete on each other class/test folders pictures to have the same number on each class/test folder
#!/bin/bash
number=()
for elem in $(find . -maxdepth 1 -type d| grep -P "[a-zA-Z]")
do
testN=$(find "$elem/test" -type f |wc -l)
number+=($testN)
done;
#echo ${number[@]}
min=0
for i in ${number[@]}; do
(( $i < min || min == 0)) && min=$i
done
for elem in $(find . -maxdepth 1 -type d| grep -P "[a-zA-Z]")
do
listFiles=$(find "$elem/test" -type f | head -n $min)
allFiles=$(find "$elem/test" -type f)
diff=$(echo ${listFiles[@]} ${allFiles[@]} | tr ' ' '\n' | sort | uniq -u)
for file in $diff
do
rm -rf $file
done;
done;
if you use my previous script to create the dataset from a structure like the caltech ones you have to create this one on the "new" folder then launch it, don't forgot to remove it after if you don't want an error with the dataset generation ( he will consider the script as a class and try to find the test and trrain folder in the path "{name fo the script}/test"
) i recommend to backup your dataset before using it
i just finish debuging and testing the classification using ORB AND SVM on the caltech classification
i got only 12 % accuracy not so good, will try with the sift descriptor now, but here is the correction for the mask = result == y
if someone can confirm
def compareResultWithTest(self, result, y):
correct = []
if(result.size != y.size):
return False
for idx in range(0, len(result)):
correct.append(result[idx] == y[idx])
return correct
mask = self.compareResultWithTest(result, y)
i got an issue with the reshape even after all the correction, checking on source i can see that descriptors can be null so they are not added to the "result" table but they are still part of your testing images, so for example you have 102(class)*15(images) for the test but one image have no descriptor so for the "y" table you will have a shape of 1530x1 and a shape of 1529x1 for the "result" table, then the np.reshape(result, (102, 15)) is called and fail, to make sure thats not your problem, just check your terminal for this message "Img with None descriptor: "
and replace the image with another or duplicate one to replace it, (or delete 1 image on each other class)
hope i am correct and that will help
Getting errors on compiling.
in descriptors.py : des = None //line 53 Error: dataset.set_class_count(class_number, len(des)) TypeError: object of type 'NoneType' has no len()
can anything be done for such an error.
Hi, i never saw this in my test, can you give more details?
main(is_interactive=True)
)Can you check the "new_des" variable in the loop : for i in range(len(class_img_paths)):
by adding
print new_des
just before if new_des is not None:
Thank you for the response, but i tried adding the train and test folders inside each of the class in dataset folder. Code is running fine. But couldnt get the accuracies as output, but instead got confusion matrix.
yes there are a mistake for the accuracies output, you can found it in the log file i think, or add a print like me
Yah looking into the code, will check it. Thank you for the response. I had looked into the log file and got 89% accuracy. Actually I'm working on a project on object classification on videos in C++. So, i'm getting stuck in utilizing the features obtained to from images to use them for training svm. not knowing how to convert the features obtained into a suitable format.
What does the confusion matrix indicate?
89% accuracy, you use your own dataset? with SIFT or ORB ?
http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
I am working on a project to recognize grocery product on shelves, but have some issue with my classification currently looking for new ideas
I'm using a random dataset with three classes so far from caltech101 dataset. I'm using ORB as Sift is not available in the opencv version 3. Our project statements do seem to be similar, instead of objects in my case, yours is specific to groceries("something like warehouse robots project.") Coming to the classification methods, so far I have read few papers and techniques., using a one-against-one svm model results in better real-time model. Another is to use a random decision forest classifier with CRF for efficiency.
I used the caltech101 dataset with all classes (except one) and got a bad accuracy (12%), to access SIFT on CV3 you have to install it with contribs.
For my classification i got over 1000 product (with only one image/product) i try using a codebook, with different clusters size, and i also try one by one comparison nothing good so far
When i reach the test step i got this error
and i don't know why