achabotl / pambox

Python auditory modeling toolbox.
http://pambox.org
BSD 3-Clause "New" or "Revised" License
35 stars 8 forks source link

Standardize the return values of the intelligibility models #9

Closed achabotl closed 10 years ago

achabotl commented 10 years ago

Each intelligibility model returns a different type of prediction value. Sometimes it is intelligibility percentage directly, but more often than not, it is some particular value that has to be transformed to intelligibility. A model can also return internal intermediate values, such a envelope powers, level spectra, etc. It would be great if the output of the the models was standardized such that the models can be used interchangeably.

achabotl commented 10 years ago

We could limit the models to just return a single numerical value, but that full too limiting. It reduces the potential of complex experimentation and makes debugging more difficult.

achabotl commented 10 years ago

We can have a single parameter, i.e. the output of the model, but that seems too little. What about intermediate steps? Parameters? Multiple outputs?

Let's say the model returns a dictionary, which would be analogous to having a function return JSON data. Should it be a single dictionary, with multiple return values in one of the pairs, or should it be a list of dictionaries? If the list is created inside the model, then the user has the job of doing the splitting himself, e.g. to add it to a dataframe.

If the output is a list of dictionaries, then it's easy for the use to loop over the dictionaries and to simply append them to a dataframe. But then, it kinda sucks because there's duplicate information, and you don't know the name of the output until you've selected the N'th element from the list. Let's say the output is of the form:

{out: {'snrenv': v,
       'lt_snrenv': ltv
    },
    param1: param1,
    param2: param2,
    ...
}

and so on. The user can access the results as: res['out']['snrenv'] It's not super obvious... and then to add to the dataframe:

d = {'snr': snr, ... 'res': res}
for name, value  in res['out'].iteritems(): 
    d['outputname'] = name
    d['value'] = value
    df = df.append(d)

If there is a single output, then that could do:

res = model.predict(...)
df = df.append(res)

but then the column names are defined in the model itself and there's no guarantee for "standards". Additionally, when querying the dataframe, one must know what the model output name was. So, let's start with the final dataframe then:

Date Model OutputName Value Units? ...
2014 mr_sEPSM SNRenv 45.0
2014 mr-sEPSM lt_SNRenv 12.0
2014 STI STI 0.5
2014 Jelfs SRM 0.5 dB

Ok , so if we take one step back, then the output of the model should have the output name(s) and the output value(s). Dictionaries are best suited for this. Units are probably not necessary and could be ignored. Then, the result of the elements could be in another "structure". What if the output is a two-element tuple with the "output dictionary" first, and then another dictionary (namedtuple?) with the intermediate values?

preds, internals = model.predict(...)
d = {'snr': snr, ..., 'internals': internals}
for name, pred in preds.iteritems():
    d['outputname'] = name
    d['value'] = pred
    df = df.append(d)

That's not too horrible. Now, what if the model has a single return value?

for k, v in preds.iteritems():
    print(k, v)

or

preds['snrenv']

if the user knows the name of the output value.

So in the end, it's pretty much a choice between a single dictionary, with an "output" key, or a tuple with the predictions alone, and the internal values in the second position.

d = {'snr': snr}
preds, internals = model.predict(...)
preds['snrenv']
d['res'] = internals
for name, pred in preds.iteritems():
    d['outputname'] = name
    d['value'] = pred
    df = df.append(d)

res = model.predict(...)
res['preds']['snrenv']
d['res'] = res
for name, value  in res['preds'].iteritems(): 
    d['outputname'] = name
    d['value'] = value
    df = df.append(d)

Finally, I think a "single" output value is a better choice. It might need an extra level of querying, but all the predictions by the model stay together. It should also be easier to, one day, change the output to namedtuples, or to objects, if necessary.

achabotl commented 10 years ago

I settled on the single dictionary with the mandatory key p for now. All the other keys. are optional.