accord-net / framework

Machine learning, computer vision, statistics and general scientific computing for .NET
http://accord-framework.net
GNU Lesser General Public License v2.1
4.48k stars 1.99k forks source link

How can I achive a QQ-Plot #315

Open Burkhardus opened 7 years ago

Burkhardus commented 7 years ago

Hello dear Cesar,

I want to do a QQ-plot with my data. I uploaded the samples data to the website of WESSA, where is R-Code behind.

There I got the following plot:

qqplot2

Now I want to achieve the same with Accord Framework. So far I have the following code.

myqq1

But the result looks a little different from the R-code:

mydist

The difference is mostly the solid line, that the points would be following, if they were from a normal distribution.

In the R-Code example the line is matching the part of the sample, at the bottom. Is there a reason for this? It seems that these particular sample points are following the normal distribution, but the outliers don't. This would mean I have to calculate the line in a different way.

Another Example is this:

In the R-Code it looks like this:

qqplot1

My result looks like this:

example2

It is close, but different. Why is this?

Burkhardus commented 7 years ago

I found the answer right here in this discussion. The calculation of the R-QQ-Line is different from the line from y=μ+σx. That's interesting and not clear because the definition of qq plot in Wikipedia is different from the R qq plot.

http://stats.stackexchange.com/questions/22258/what-is-the-use-of-the-line-produced-by-qqline-in-r

Burkhardus commented 7 years ago

The secret behind the solid line in the R-QQ plot is simply this: They use only a part of the samples to calculate the slope/standard deviation of the line. Actually they use the first and the third Quartiles Q1 and Q3.

After I adapted my code according to this definition, it workes like a charm and looks very similar to the R-QQ-plot.

exampleq1q3_1


                Dim QQPlotPane As New GraphPane
                Dim listQQ1 As New PointPairList
                Dim listQQ2 As New PointPairList
                Dim NormDist1 As New NormalDistribution()

                QQPlotPane.XAxis.Type = AxisType.Linear
                QQPlotPane.XAxis.Title.Text = "QQ Plot Norm Dist"
                QQPlotPane.XAxis.MajorGrid.Color = Color.DimGray
                QQPlotPane.XAxis.MajorGrid.DashOn = 1
                QQPlotPane.XAxis.MajorGrid.DashOff = 1
                QQPlotPane.XAxis.MajorGrid.IsVisible = True
                QQPlotPane.XAxis.Scale.MajorStepAuto = True
                QQPlotPane.XAxis.Scale.MinorStepAuto = True
                QQPlotPane.XAxis.Scale.MaxAuto = True
                QQPlotPane.XAxis.Scale.MinAuto = True

                QQPlotPane.YAxis.Type = AxisType.Linear
                QQPlotPane.YAxis.MajorGrid.Color = Color.DimGray
                QQPlotPane.YAxis.MajorGrid.DashOn = 1
                QQPlotPane.YAxis.MajorGrid.DashOff = 1
                QQPlotPane.YAxis.MajorGrid.IsVisible = True
                QQPlotPane.YAxis.Scale.MajorStepAuto = True
                QQPlotPane.YAxis.Scale.MinorStepAuto = True
                QQPlotPane.YAxis.Scale.MaxAuto = True
                QQPlotPane.YAxis.Scale.MinAuto = True

                QQPlotPane.Fill = New Fill(Color.DarkGray)

                Dim listQ1Q3 As New List(Of Double)

                Dim pi As Double = 0
                For i As Integer = 1 To 999
                    pi = pi + 0.001
                    If pi < 0.25 Or (pi > 0.5 And pi < 0.75) Then
                        listQ1Q3.Add(SampleDist.InverseDistributionFunction(pi))
                    End If
                    listQQ1.Add(NormDist1.InverseDistributionFunction(pi), SampleDist.InverseDistributionFunction(pi))
                Next

                Dim SampleDistQ1Q3 As New EmpiricalDistribution(listQ1Q3.ToArray)

                listQQ2.Add(-3, -3 * SampleDistQ1Q3.StandardDeviation + SampleDistQ1Q3.Mean)
                listQQ2.Add(3, 3 * SampleDistQ1Q3.StandardDeviation + SampleDistQ1Q3.Mean)

                Dim QQCurve2 As LineItem = QQPlotPane.AddCurve("", listQQ2, Color.Blue, SymbolType.Circle)
                Dim QQCurve1 As LineItem = QQPlotPane.AddCurve("", listQQ1, colLabel, SymbolType.Circle)

                QQCurve1.Symbol.IsVisible = True
                QQCurve2.Symbol.IsVisible = False
                QQCurve1.Line.IsVisible = False
                QQCurve2.Line.IsVisible = True
                QQCurve2.Line.Width = 2

                ChartZ.MasterPane.Add(QQPlotPane)
cesarsouza commented 7 years ago

Hi @Burkhardus,

Apologies for not being able to respond to your issue request on time. It took me some serious time to finally get back to the middle pages of the issue tracker. However, I am glad that you had found a solution for your issue way back last year!

If I understood correctly, in order to achieve a QQ plot like in R, one needs to consider not all possible inverse distribution points but rather only the ones at the quartiles (0, 0.25, 0.5, 0.75, 1.0)?

If your solution is still working well, wouldn't you like to contribute it to the project in form of a new class in the Accord.Statistics.Visualization, or Accord.Controls namespaces?

Regards, Cesar