EdwardRaff / JSAT

Java Statistical Analysis Tool, a Java library for Machine Learning
GNU General Public License v3.0
789 stars 205 forks source link

Bug in the ARFFLoader #63

Closed kasium closed 7 years ago

kasium commented 7 years ago

While loading an ARFF file, there's a bug with the naming of numeric variables: Line 199: There is no need to use the helper variable k. Simple Example: The list "variableNames" contains "A", "B", "C". "A" is a categorical variable. Inside your loop, you check if the variable is numeric, so in the first iteration no name will be set, i will be 1 and k is 0. In the second iteration you use k to get name. Normally this should be "B", but because k is 0, you will get "A", which is wrong. So just remove k and use i and everything will be fine.

EdwardRaff commented 7 years ago

Do you have a simple test case that can reproduce this? I’m busy at the moment, but will look into this when I get a chance.

On Jul 18, 2017, at 6:55 AM, Kai Janis Müller notifications@github.com wrote:

While loading an ARFF file, there's a bug with the naming of numeric variables: Line 199: There is no need to use the helper variable k. Simple Example: The list "variableNames" contains "A", "B", "C". "A" is a categorical variable. Inside your loop, you check if the variable is numeric, so in the first iteration no name will be set, i will be 1 and k is 0. In the second iteration you use k to get name. Normally this should be "B", but because k is 0, you will get "A", which is wrong. So just remove k and use i and everything will be fine.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EdwardRaff/JSAT/issues/63, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6m0jE6SoTFMU4WYx4PB_82DMauEBn2ks5sPI8VgaJpZM4ObLDH.

kasium commented 7 years ago

This is a simple test case for you: ARFFTest.txt

Like you can see, the second numeric attribute has the name of the categorical attribute.

EdwardRaff commented 7 years ago

You should start from 0 index instead of 1 in your test case. But either way, thanks! Bug is fixed now. Sorry for the delay, I've been busy!