Closed scais closed 11 years ago
Hi Stepan,
In general, the input is the "raw data".
Regarding your question about Moby Dick. In this example, the data collected would have been individual words. So the raw data here is how many times did each word appear, i.e. word frequency.
The reason I use "frequencies from frequencies" within the distribution object is for efficiency.
Does that make sense?
Cheers
Colin
Hello,
I would like to ask You, which kind of data should I use as an input for creating displ$new(data) object. In example is written "The Moby Dick dataset contains the frequency of unique words", so that mean, before passing my data to displ$new, they should be in frequency format?
For better imagination,here is an exampe. For 10-sided dice, if I am randomly throwing it, I am getting these numbers:
3 7 5 3 2 1 10 8 4 1 1 1
If I make frequnecies from these numbers, I would get:
1 - 4 2 - 1 3 - 2 4 - 1 5 - 1 6 - 0 7 - 1 8 - 1 9 - 0 10 - 1
Which numbers - "raw data" (3,7,5,3 ...) or "frequencies" (4,1,2,1 ...) do I pass as an argument to the function?
I am asking, because inside the displ object are inner argument containig frequencies. But these frequencies seem like to be "frequencies from frequencies" and because of it, I thought,the input should be "raw data".
Thank You,
Stepan