Waikato / moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
http://moa.cms.waikato.ac.nz/
GNU General Public License v3.0
603 stars 352 forks source link

Instance/InstancesHeader modifications for easier use of multi-target streams #257

Closed aosojnik closed 1 year ago

aosojnik commented 2 years ago

@abifet I've made some modifications to the core Instances class which allow for easier handling of multi-target streams. The specific changes are explained below. All changes made should be backwards compatible and do not break any tests, i.e., they don't fail any additional ones than the ones the original branch also fails.


ArffLoader is modified to load the header, i.e., the getHeader method, with predefined definitions of which input and target attributes to load, according to the format defined below. The old method is kept for backwards compatibility

AttributesInformation is simplified, as the index information was not being used anywhere, this information is moved to InstancesInformation

InstanceImpl is modified to work with the changes below. Nothing major, just slightly different implementations of some methods.

InstancesInformation now holds two arrays of input and target indices. The target indices take place of the (here removed) range attribute and serve the same function, but with expanded options, i.e., providing non-contiguous targets. Some methods are added for backwards compatibility.

Instances is modified to include additional constructors and backwards compatibility methods.

AttributeDefinitionUtil is a new class that allows for more expressive definitions of selected attributes, such as ranges ("5-7"), index combinations ("1,5,10"), exclusions ("1-10,!5" takes all attributes from 1 to 10, excluding 5), negative indexes ("-3" is the third attribute from the back), negative ranges ("1~-5" takes all attributes from 1 to the sixth from last) and any combinations of the earlier.

MultiTargetArffLoader is simplified as now all functionality is available in ArffLoader already.

ArffFileStream is modified to allow users to specify the targets using the above format, as well as the input attributes. In practice, this allows for direct filtering of input and/or target attributes at the loading stage. This potentially deprecates some classes, e.g., moa.streams.filters.SelectAttributesFilter.

MultiTargetArffLoaderTest is modified such that it uses the new index array instead of the old range attribute on InstancesInformation.

aosojnik commented 2 years ago

I also included the MLC via MTR class that we discussed a while a go, which should be used when using a multi-target regressor to do multi-label classification. This will now set the predictions properly and should result in proper calculation of evaluation measures.