Closed diou closed 11 years ago
@diou Except for the number of times distribution?
Yes, sorry, you are right. For number of times only histogram should be possible.
In order to get this, is it related to what we had discussed on #141 ? Chart type depending on user plot type selection ?
For number of times
the graph should always be a bar chart, but histogram goes with values and the other distributions with parameters. For start time
and duration
distributions again histogram goes with values and the other distributions with parameters, but in this case the graphs also change. In the case of start time
and distribution
histogram goes with bar chart and the other with line graph like it is now. @diou Correct?
Yes, as long as the bar chart displays correctly for the large number of values (e.g. for start time or if duration values exceed 200 minutes). If there's a problem when the number of bars is large we should again use lines.
To sum up, here are the requirements for displaying distributions:
I hope I'm not forgetting something. If something is unclear or you don't agree please let me know.
About item 3 you are talking about the chart type or also about the distribution type. Could I have in number of times
distribution a normal distribution with mean
: 2 and std
: 0.5? Or should it be only Histrogram type with values [0.5, 0.4, 0.1, 0, 0]?
The latter, I think the distribution type should only be a histogram. This is because number of times is a discrete distribution of integer values and therefore there's not much point in modelling it as a continuous parametric distribution.
OK. @fgiannar I will assign you this task. If you need anything server-side let us know.
Just to confirm, the following range applies to y-axis values, correct? [max(0, mean - 8 * std) min(1440, mean + 8 * std)]
The y-axis values are probabilities. The above values refer to the x-axis.
They are referring to the x axis: So for example if mean is 100 and std is 10 then it should be in [20, 180]. If mean is 100 and std is 20 then in [0, 260]. (I see @diou has already answered, but since I wrote it :))
And one more thing, the small bar chart won't display correctly for a large number of values (>=100) (when clicking on it, the larger chart popup appears where the bars are displayed properly since there is more space for the chart). So how would you prefer to handle this? Have a filter and when the values are >= 100 and distr type is Histogram display line chart or use a line chart in all cases?
I am fine with the >= 100 case, but we have to see the look and feel first. Perhaps it could be >=50...
I agree, you can set a threshold (e.g. 50, 100 whatever works) and if the number of values exceeds it, use a line.
Also one more comment: The distributions are in the [0, 1] range while histograms are in the [0, 100] range (i.e. percentages). It would be nice to be consistent (in the descriptions too).
ok, thnx
When trying to make a PUT request in distributions, with data as follows: { _id: "50eaafe1e4b0e21868c64ce6", actmod_id: "5045f2b4e4b058c3f86c3301", description: "", distrType: "Histogram", name: "duration", values:[3, 4, 5, 6, 7, 8], parameters: [{mean:30, std:10}] }
I get the following exception:
{ "success": false, "errors": { "Exception": "Null" }, "message": "MongoQueryError: Cannot execute find query for collection: distributions with qKey={ \"_id\" : { \"$oid\" : \"50eaafe1e4b0e21868c64ce6\"}} and qValue={ \"actmod_id\" : 1 , \"description\" : 1 , \"distrType\" : 1 , \"name\" : 1 , \"values\" : 1 , \"parameters\" : 1}" }
Is it possible that the requests are allowed to contain parameters as well as values? If the distrType is "Histogram", then the values will be read, otherwise the parameters?.This will allow the user to switch between distrTypes, without having to re-enter values and parameters.
Btw, if this is not easy to implement it's ok I'll reset the corresponding field (e.g. if distrType = "Histogram", set parameters = [],) before sending a request.
I followed the above implementation (resetting corresponding values based on user selection). Some questions before we can close this issue: 1) @diou : "The distributions are in the [0, 1] range while histograms are in the [0, 100] range (i.e. percentages)." Where exactly would you like me to add this information? On the question-mark tooltip text maybe when hovering on parameter or values info? 2) Would it be easy to convert the plot values server-side (I am referring to multiplying those values by 100 when distrType != "Histogram")
You don't need to add an explanation. I would suggest to NOT use percentages in the histograms and to remove the '%' from the description. This is because these distributions are actually probability density functions and can have values > 1 and the use of percentages doesn't make sense.
@kyrcha: Perhaps the percentile values in the histograms is a server-side issue?
Ok I'll remove the '%'. But still when distrType is anything but Histogram, the y values should be multiplied with 100. I was wondering if this is easy to be implemented server side. Please let me know so I can start working on it if necessary.
The values should not be multiplied by 100. We decided to keep them as probabilities and not as percentages. So the y-axis should be whatever values the user enters and the x-axis should start from 0 and be integer values for each bar. For example if the user enters [0.3, 0.2, 0.1, 0.5] the (x,y) points should be [(0,0.3), (1, 0.2), (2, 0.1), (3, 0.5)]
ok clear, then I'll remove the conversion (x100) in the next commit, and if you think everything is ok we can close this.
Just checked the code, these are NOT density functions (they are probability distributions) so please also remove the 'density' word as well. It should be 'Probability vs Duration'. The same also for the enlarged graph view.
Thanks!
@fgiannar one more thing as well...are you multiplying with 100 (in order to get percentages) the values we send in duration and start times? If yes, then remove the multiplication since we are switching to probabilities, if no then let us know since we need to change it server side. Thnx
Uniform distribution is also not displayed (e.g. try [{"start":100, "end":200}] which is the example given in the manual)
Also, for some reason when I insert a GMM duration, the x axis display is cut-off. E.g. try [{"w":0.6, "mean":350, "std": 10},{"w":0.4, "mean":420, "std":10}], which is cutoff around 420 for some reason.
One way to correctly visualize the GMMs is to use the rule
low_i = max(0, mean_i - 8 * std_i) high_i = min(1440, mean_i + 8 * std_i)
for each mixture component i and the use the bounds
[min(low_i), max(high_i)] where the min, max operators are over i.
@kyrcha Yes, I was multiplying, so it 'll be removed on the latest commit @diou Maybe I'm missing this, 'cause I don't remember talking about how to handle Uniform or GMM distributions(e.g. [{"start":100, "end":200}] is supported so far ). So we have to clear this:
Yes, we didn't ask for this. However, there are two issues:
Both these bugs are reproducible by the examples I gave in the above comments. We don't need to set bounds for GMM and uniform right now (it's not currently important) but these bugs should be fixed if possible.
Thanks
@diou One more question: Is it possible that in Normal Distribution the parameters are: [{"w":0.6, "mean":350, "std": 10},{"w":0.4, "mean":420, "std":10}]? If yes, then the rule:" [max(0, mean - 8 * std) min(1440, mean + 8 * std)]" is not applicable. Should we maybe switch to "[min(low_i), max(high_i)]" for all types except Histogram?
After the latest update the x-axis labels are not visible (in both graph views)
Some distribution normalization issues:
[max(0, mean - 8 * std) min(1440, mean + 8 * std)]
so that the distribution is visible.