firefly-cpp / NiaARM

A minimalistic framework for Numerical Association Rule Mining
MIT License
15 stars 6 forks source link

Do not save "similar rules" #103

Closed firefly-cpp closed 4 months ago

firefly-cpp commented 7 months ago

The current version of NiaARM includes each distinctive rule in the archive of identified rules. However, many numerical rules can be viewed as the same since there are only differences on the 7 or 8 eighth decimal.

To solve this "issue," I recommend that when storing a new rule in an archive, we check whether a similar rule is already included.

firefly-cpp commented 7 months ago

@zStupan, what do you think?

firefly-cpp commented 7 months ago

@mlaky88, what is your opinion?

mlaky88 commented 7 months ago

This would definitely help. It could maybe be implemented by using a similarity threshold. For example, check each boundary for numerical attributes for generated rules, and compare to existing archive. If lower and upper boundaries off all attributes are within the threshold, the reject the rule.

firefly-cpp commented 7 months ago

@zStupan, what do you think?

mlaky88 commented 5 months ago

Is there maybe any progress? This would be really beneficial, and raise the overall quality of the mined rules.

zStupan commented 5 months ago

I apologize, I've been very busy. I'll get to work on this ASAP.

zStupan commented 4 months ago

Ok, in #109 I've changed the way attributes get compared. Now, if 2 numerical attributes' bounds match up to 6 decimals, they're considered equal. There doesn't seem to be much of a difference in the number of rules generated though.

firefly-cpp commented 4 months ago

@mlaky88: @zStupan has already implemented this feature. Please update to the recent 0.3.7 release and try it out.