Implement new MCRM algorithm based on k-means clustering

gogins commented 6 years ago

For example, using https://github.com/genbattle/dkm or OpenCV.

There is also https://www.npmjs.com/package/ml-kmeans.

Given a run of N samples of either the random or discrete IFS algorithm, produce a score of K notes consisting of the centroids computed by the algorithm.

The samples could occupy 5 up to about GB of RAM on my notebook. The MCRM algorithm would operate on std::array<float> or <double>. The dimensions would be {instrument, time, duration, key, velocity, pan} or 6 elements for a size of 24 or 48 bytes per sample, giving at least 104,166,666 samples. For a 2,000 note score, that would be 52,083 samples per note. The algorithm clearly is feasible regarding storage. About compute time, I don't know yet.

This should be done both for score space (plain notes) and for fractal interpolation in chord spaces.

The csound::Event class will need a copy constructor for std::array<>.

This approach should remove a good chunk of hackishness from my general approach.

Finally solved it!

gogins commented 6 years ago

Lo and behold, it works pretty much out of the box. To do:

Print time to compute at various stages. As the number of centroids (notes) needed is large, and the problem of finding the centroids is NP-hard, the k-means function is quite time-consuming. But, perhaps, not too time-consuming! I tested with 2,000,000 samples and 500 means. For thousands of means, it might prove impractical, I need to measure the time.
Temper.
Tie overlapping notes.

gogins commented 6 years ago

Time is going to be an issue, but not I think insuperable.

gogins commented 6 years ago

We have for 2,000 notes:

KMeansMCRM::random_algorithm:     0.039 seconds.
KMeansMCRM::means_to_notes...
dkm::kmeans_lloyd...
dkm::kmeans_lloyd:  2710.647 seconds.
KMeansMCRM::means_to_notes:  2710.647 seconds.
KMeansMCRM::generate:  2710.686 seconds.

which is 45 minutes. Much longer than I normally tolerate, but still better than the old days.

gogins commented 6 years ago

With this:

    mcrm.sample_count = 20000000;
    mcrm.means_count =       200;

We do not get the best possible approximation to the fractal, as shown by opening the generated MIDI file in a piano roll editor. The heavier density in some places seems to "pull" means towards those places. This is a pretty big number of samples, but the notes along the base of the Sierpinski triangle differ by a semitone in places.

screenshot from 2018-08-18 13-24-24

This obviously is not good enough.

gogins commented 6 years ago

It looks like the deterministic algorithm will be best after all, if it can be redone to take into account overlapping points and the need for numerical approximation. Right now I am just trying the KMeansMCRM using the deterministic algorithm to see if that behaves differently.

It does not, but the plain old MCRM works just fine.

screenshot from 2018-08-18 13-50-52

This obviously works much better... I spent some time but I learned something.

gogins / csound-extended

Implement new MCRM algorithm based on k-means clustering #46