Open Burkhardus opened 7 years ago
I found the answer right here in this discussion. The calculation of the R-QQ-Line is different from the line from y=μ+σx. That's interesting and not clear because the definition of qq plot in Wikipedia is different from the R qq plot.
http://stats.stackexchange.com/questions/22258/what-is-the-use-of-the-line-produced-by-qqline-in-r
The secret behind the solid line in the R-QQ plot is simply this: They use only a part of the samples to calculate the slope/standard deviation of the line. Actually they use the first and the third Quartiles Q1 and Q3.
After I adapted my code according to this definition, it workes like a charm and looks very similar to the R-QQ-plot.
Dim QQPlotPane As New GraphPane
Dim listQQ1 As New PointPairList
Dim listQQ2 As New PointPairList
Dim NormDist1 As New NormalDistribution()
QQPlotPane.XAxis.Type = AxisType.Linear
QQPlotPane.XAxis.Title.Text = "QQ Plot Norm Dist"
QQPlotPane.XAxis.MajorGrid.Color = Color.DimGray
QQPlotPane.XAxis.MajorGrid.DashOn = 1
QQPlotPane.XAxis.MajorGrid.DashOff = 1
QQPlotPane.XAxis.MajorGrid.IsVisible = True
QQPlotPane.XAxis.Scale.MajorStepAuto = True
QQPlotPane.XAxis.Scale.MinorStepAuto = True
QQPlotPane.XAxis.Scale.MaxAuto = True
QQPlotPane.XAxis.Scale.MinAuto = True
QQPlotPane.YAxis.Type = AxisType.Linear
QQPlotPane.YAxis.MajorGrid.Color = Color.DimGray
QQPlotPane.YAxis.MajorGrid.DashOn = 1
QQPlotPane.YAxis.MajorGrid.DashOff = 1
QQPlotPane.YAxis.MajorGrid.IsVisible = True
QQPlotPane.YAxis.Scale.MajorStepAuto = True
QQPlotPane.YAxis.Scale.MinorStepAuto = True
QQPlotPane.YAxis.Scale.MaxAuto = True
QQPlotPane.YAxis.Scale.MinAuto = True
QQPlotPane.Fill = New Fill(Color.DarkGray)
Dim listQ1Q3 As New List(Of Double)
Dim pi As Double = 0
For i As Integer = 1 To 999
pi = pi + 0.001
If pi < 0.25 Or (pi > 0.5 And pi < 0.75) Then
listQ1Q3.Add(SampleDist.InverseDistributionFunction(pi))
End If
listQQ1.Add(NormDist1.InverseDistributionFunction(pi), SampleDist.InverseDistributionFunction(pi))
Next
Dim SampleDistQ1Q3 As New EmpiricalDistribution(listQ1Q3.ToArray)
listQQ2.Add(-3, -3 * SampleDistQ1Q3.StandardDeviation + SampleDistQ1Q3.Mean)
listQQ2.Add(3, 3 * SampleDistQ1Q3.StandardDeviation + SampleDistQ1Q3.Mean)
Dim QQCurve2 As LineItem = QQPlotPane.AddCurve("", listQQ2, Color.Blue, SymbolType.Circle)
Dim QQCurve1 As LineItem = QQPlotPane.AddCurve("", listQQ1, colLabel, SymbolType.Circle)
QQCurve1.Symbol.IsVisible = True
QQCurve2.Symbol.IsVisible = False
QQCurve1.Line.IsVisible = False
QQCurve2.Line.IsVisible = True
QQCurve2.Line.Width = 2
ChartZ.MasterPane.Add(QQPlotPane)
Hi @Burkhardus,
Apologies for not being able to respond to your issue request on time. It took me some serious time to finally get back to the middle pages of the issue tracker. However, I am glad that you had found a solution for your issue way back last year!
If I understood correctly, in order to achieve a QQ plot like in R, one needs to consider not all possible inverse distribution points but rather only the ones at the quartiles (0, 0.25, 0.5, 0.75, 1.0)?
If your solution is still working well, wouldn't you like to contribute it to the project in form of a new class in the Accord.Statistics.Visualization, or Accord.Controls namespaces?
Regards, Cesar
Hello dear Cesar,
I want to do a QQ-plot with my data. I uploaded the samples data to the website of WESSA, where is R-Code behind.
There I got the following plot:
Now I want to achieve the same with Accord Framework. So far I have the following code.
But the result looks a little different from the R-code:
The difference is mostly the solid line, that the points would be following, if they were from a normal distribution.
In the R-Code example the line is matching the part of the sample, at the bottom. Is there a reason for this? It seems that these particular sample points are following the normal distribution, but the outliers don't. This would mean I have to calculate the line in a different way.
Another Example is this:
In the R-Code it looks like this:
My result looks like this:
It is close, but different. Why is this?