Question - input data format (raw data vs. frequencies)

csgillespie / poweRlaw

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.

109 stars 24 forks source link

Hello,

I would like to ask You, which kind of data should I use as an input for creating displ$new(data) object. In example is written "The Moby Dick dataset contains the frequency of unique words", so that mean, before passing my data to displ$new, they should be in frequency format?

For better imagination,here is an exampe. For 10-sided dice, if I am randomly throwing it, I am getting these numbers:

3 7 5 3 2 1 10 8 4 1 1 1

If I make frequnecies from these numbers, I would get:

1 - 4 2 - 1 3 - 2 4 - 1 5 - 1 6 - 0 7 - 1 8 - 1 9 - 0 10 - 1

Which numbers - "raw data" (3,7,5,3 ...) or "frequencies" (4,1,2,1 ...) do I pass as an argument to the function?

I am asking, because inside the displ object are inner argument containig frequencies. But these frequencies seem like to be "frequencies from frequencies" and because of it, I thought,the input should be "raw data".

Thank You,

Stepan

csgillespie / poweRlaw

Question - input data format (raw data vs. frequencies) #7