Open HiramTheHero opened 2 years ago
Thanks for the issue. A minimal working example that shows this behavior is qqnorm([1,2,3])
:
However, note that this example is equivalent to calling
using Distributions
qqpair = Distributions.qqbuild(Normal(), [1,2,3])
plot(qqpair)
Note that qqpair
just wraps the x
and y
values of the desired points, which we plot. So I would advise opening an issue on Distributions.jl.
Version information
julia version 1.6.2
StatsPlots v0.14.26
Plots v1.19.3
Hey everyone. I was messing around with qqnorm and noticed that it was neglecting to plot the highest point in the dataset I sent it. The dataset I'm using is located here.I loaded in the data.
data = CSV.File(filePath) |> DataFrame
Filtered the data into two datasets by gender.
HHSgirls = subset(data, :Gender => ByRow(==("Female")), skipmissing=true)
HHSboys = subset(data, :Gender => ByRow(==("Male")), skipmissing=true)
Ensured that no values from the Reaction_time column were missing values.
HHSgirlsClean = dropmissing(HHSgirls, :Reaction_time);
HHSboysClean = dropmissing(HHSboys, :Reaction_time);
Then I put the data into qqnorm.![file](https://user-images.githubusercontent.com/22198719/127552083-6c64ce17-127b-4d37-8f2e-65c1243b4504.png)
qqnorm(HHSgirlsClean[!,:Reaction_time], yaxis="Female Reaction Time", qqline = :fit)
I get the following plot.The issue is that the plot is neglecting the highest value of the column I put into qqnorm(). (Which value is 46.)
maximum(HHSgirlsClean[!,:Reaction_time])
46.0
If I extend the y-axis (and x-axis to be safe) limits to include where the point should be, the point is still missing.![file2](https://user-images.githubusercontent.com/22198719/127552757-907fbeca-e6f1-4b99-8d89-2d75ebc8d307.png)
qqnorm(HHSgirlsClean[!,:Reaction_time], yaxis="Female Reaction Time", ylims=(-5,50), xlims=(0,15), qqline = :fit)
Plot from the code directly above.Same thing happens with the boy dataset.![file3](https://user-images.githubusercontent.com/22198719/127553308-9bdaf0e1-68ec-44bf-bd42-31589c680adc.png)
qqnorm(HHSboysClean[!,:Reaction_time], label="Male Reaction Time",qqline = :fit)
maximum(HHSboysClean[!,:Reaction_time])
1000.0
Graph with the extended axes.![file4](https://user-images.githubusercontent.com/22198719/127553557-859bec05-077f-48e6-84ca-b309919b5a19.png)
Just a note about the above. Forget the titles on the graphs. I forgot to eliminate them.
However, interesting enough, if I try the same process with the qqplot function, the 2nd to highest point is neglected in the plot.
sort(HHSgirlsClean[!,:Reaction_time])
Output in Julia REPL
0.0489 0.139 0.142 0.148 0.23 0.25 0.261 0.27 ⋮ 3.0 3.0 4.2 5.0 7.129 10.0 30.0 46.0
Just to clarify, 30 is the 2nd to highest point.Setting up qqplot function with a normal distribution.
normDist = rand(Normal(), 100)
Plotting
qqplot(normDist, HHSgirlsClean[!,:Reaction_time], qqline = :fit)
Result![file](https://user-images.githubusercontent.com/22198719/127555471-ae57d44a-2de1-4ea8-ade8-e65cd3665ca3.png)
Note that 30 is not included in the graph.
Same with the boy dataset...
sort(HHSboysClean[!,:Reaction_time])
Output in Julia REPL
0.0417 0.06 0.084 0.1 0.1999 0.202 0.212 0.223 ⋮ 1.2 3.0 5.0 6.0 6.0 6.7 404.0 1000.0
To clarify, 404 is the 2nd to highest point.
Plotting
qqplot(normDist, HHSboysClean[!,:Reaction_time], qqline = :fit)
Result![file2](https://user-images.githubusercontent.com/22198719/127556051-44f21046-3b3a-4d18-8f8c-8f4897619a25.png)
Notice that point 404 is missing from the graph.
I'm a bit worried that I may be doing something wrong. So, please let me know if this is a user-error on my end. Also, I am using Visual Studio Code. Not sure if that would cause issues.