Closed codaich closed 8 years ago
Result of further investigation:
DEFERRED SOLUTION TO PERHAPS IMPLEMENT IN THE LONG-TERM: One possible solution would be be to update jsymbolic2.processing.MIDIFeatureProcessor and the code it calls such that it stores a value of null in the feature_values field of the ace.datatypes.DataSet object for each MEI-specific feature for each MIDI file processed. However, this would be tricky to implement properly, and could introduce other problems, so we will avoid this solution for now.
ACTUAL SOLUTION TO IMPLEMENT: Have jSymbolic check before extraction begins if any MEI-specific features are selected and, if any are, if any non-MEI files are scheduled to have features extracted from them. If so, then throw an informative exception and end execution. This should check for MEI-specific features in general, not just the two that happen to currently be implemented, so this will continue to function properly even after new MEI-specific features are added. It should also check to see if any features that are selected to be extracted have dependencies on features that are MEI-specific. This must work for execution initiated in any way, including with the GUI, command line and API. Try to implement a checking alogirthm that can be done quickly (i.e. without extracting all jsymbolic2.processingMIDIIntermediateRepresentations for a file).
I'm assigning this back to you, @dinamix , since you're more familiar with this part of the code.
The ACE XML MEI feature extraction seems to be working here even if the MEI file is not first in the list. However, the ARFF/CSV do not.
The issue is that the ACE XML Feature Definitions file is used to generate the feature column headings for ARFF and CSV files. The ACE XML Feature Values file will be fine regardless, since each instance has its own self-contained feature headings, but the ACE XML Feature Definitions file (and ARFF + CSV files) can only have one set of features, and this is based on the first instance encountered. This is why the ARFF and CSV files will have MEI-specific feature labels only if the first instance encountered is MEI.
Fixing this problem would be complex, and we want to avoid sparse feature sets (i.e. those where some features are only present for certain instances) anyway, so this is why I think it's best to pursue the "ACTUAL SOLUTION TO IMPLEMENT" I proposed in the comment above.
Good, "ACTUAL SOLUTION TO IMPLEMENT" implemented in Commit [0168da6]. May implement a more flexible solution in the future, but issue closed for now.
When MEI features (e.g. the number of grace notes) are set to be extracted and both MEI and MIDI files are included in the batch to be processed, no placeholders are placed in the ARFF or CSV files to indicate the missing feature values for the MIDI files, which renders the ARFF and CSV files invalid. The ACE XML files are correct, however.