Shahabks / myprosody

A Python library for measuring the acoustic features of speech (simultaneous speech, high entropy) compared to ones of native speech.
https://shahabks.github.io/myprosody/
MIT License
232 stars 63 forks source link

Percentile Calculation error and calculation logic of parameters #17

Closed shalabhsingh closed 3 years ago

shalabhsingh commented 3 years ago

While running of the function mysp.myprosody(p,c) in the script testpro.py, I started getting multiple doubts which I will request you to clarify. I saw that the function definiton is in myprosody.py and I have doubt in this section of code-

for i in range(25):
            sl0=dataframe[4:7:1,i+1]
            score = array[0,i]
            he=scipy.stats.percentileofscore(sl0, score, kind='strict')
            if he==0:
                he=25
                dfout = "%s:\t %f (%s)" %  (nsns[i],he,"% percentile ")
                print(dfout)
            elif he>=25 and he<=75:
                dfout = "%s:\t %f (%s)" % (nsns[i],he,"% percentile ")
                print(dfout)
            else:
                dfout = "%s:\t (%s)" % (nsns[i],":Out of Range")
                print(dfout)

Here you are taking percentile for each parameter, from stats.csv and reporting percentile using "strict" scoring. As a result percentiles can only be 0, 25, 33.3, 66.6 or out of range. What is the reasoning behind using fixed percentile values? Also if we also have min and max values (0 and 100 percentiles respectively), then why we are reporting some >75 percentiles as out of range?

Apart from this, I want to know the definition and calculating procedure for some of the features-

Also, what are the accuracies of the models used to calculate these scores? Is this documented in some paper or report?

Shahabks commented 3 years ago

These articles might be of interest to you "Automatic scoring of non-native spontaneous speech in tests of spoken English", Speech Communication, Volume 51, Issue 10, October 2009, Pages 883-895 "A three-stage approach to the automated scoring of spontaneous spoken responses", Computer Speech & Language, Volume 25, Issue 2, April 2011, Pages 282-306 "Automated Scoring of Nonnative Speech Using the SpeechRaterSM v. 5.0 Engine", ETS research report, Volume 2018, Issue 1, December 2018, Pages: 1-28