Open ProfessorAmanda opened 2 years ago
I've made a prototype on shading under the curve, and it's up on NormalDistribution
branch.
It turned out to be fairly doable and we can surely build more stuff around it!
It's so great!! I already love how you can see the effect that changing the mean and standard deviation has on the area under the curve.
So now we can get serious about this. Fantastic!
Next steps:
Got it! Thanks! I'll work on them and let you know when I make progress.
I've made some progress on the module. They can be checked out on branch NormalDistribution
I think it looks great! The math seems right, as much as I was able to see from playing around with it. And I actually like how you have written the probability line for now. If you have an idea for a better way to represent it, let me know, but I think it is fine as is for now. Some notes on what I see so far:
I am so excited to be able to use this for teaching!
As a next step, I'd like to be able to have the user click a button and "draw" a set of observations from the distribution they have plotted above and show the user a table of those values and a dot plot of those values (similarly to how we plot them in, say, the law of large numbers or central limit theorem modules). The object of this is to further reinforce what is meant by a "distribution." You'll need to look into how javascript can generate a list of random numbers (and crib a little from what previous research assistants have done). This is a nice next step for you to look into, because we will also need this for the test of normality.
Thanks and let me know if you have questions!
Got it! I've fixed the issues listed above, and I'm working on the next step of drawing observations.
Fantastic! Keep me posted.
Amanda G. Gregg Associate Professor of Economics, Middlebury College Join My Personal Zoom Roomhttps://middlebury.zoom.us/my/agregg?pwd=OWlGMmZMSWJaUkowRG5DUWJtRm9CQT09 (Password: EconHist) Office: Farrell House 101 Office Phone: (802) 443 - 3419<tel:+18024433419> Pronouns: she/her/hers
From: Wayne Wang @.> Sent: Wednesday, July 6, 2022 11:48:20 AM To: ProfessorAmanda/econsimulations @.> Cc: Gregg, Amanda G. @.>; Assign @.> Subject: Re: [ProfessorAmanda/econsimulations] New Module Idea: Normal Distribution Simulation (Issue #295)
Got it! I've fixed the issues listed above, and I'm working on the next step of drawing observations.
— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FProfessorAmanda%2Feconsimulations%2Fissues%2F295%23issuecomment-1176385781&data=05%7C01%7Cagregg%40middlebury.edu%7C9487ac7be9054c43925108da5f66f42a%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637927193041413560%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HodVdP4fsjgxgHRgBP0%2FiM4OKa62n99iiuGLXuMD9u0%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMWGM7HTC6TS2ES2AM5LWKDVSWTEJANCNFSM52GBPX6Q&data=05%7C01%7Cagregg%40middlebury.edu%7C9487ac7be9054c43925108da5f66f42a%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637927193041413560%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Udw%2F%2FG%2F9hHcdMAcUO5cIBUJ3HvioN%2FWVLxYELFOczak%3D&reserved=0. You are receiving this because you were assigned.Message ID: @.***>
The drawing samples part is up on the branch! Please let me know what you think. (Sorry! That took a bit longer than expected due to a hard-to-debug highchart usage)
Hey! Sorry I missed this update. That definitely is working very well. I have a few comments just about formatting, to make it clear to user that the drawing samples step is distinct.
Let me know what you think! I'll get started on the next part.
I've made the changes:) Putting "Experiment with Drawing Samples from This Distribution" in a button made the button a bit too wide, so I've put it as a line of text. Is it fine?
I love the way everything in the blue box looks! Those were great aesthetic choices.
Can I make a suggestion about where things are located on the page? Can we center the blue box and then center the data that is drawn below that blue box?
Of course! It's up on the branch and you can take a look.
Thanks! Looks great.
Oh, one question (I didn't notice this before): What is the rule that determines when an observation will be highlighted in the sample table? I thought it might be whatever was described in the probability rule, but it seems to be "two-tailed." Like, if the rule is P(x>1), I noticed that observations less than -1 were also highlighted in blue. I hadn't even thought of including the highlighted observations, so you could just remove the highlights, or make the highlights agree with the probability rule specified by the user.
Sure. I think the currently highlighted rows are the points that are drawn.
I realized that I was probably doing the samples not the way we wanted – on every "draw a sample" click, a brand new 100 points population is generated, and a desired number of them are selected. I think we want instead is that when mean and standard deviation change, we have a new population of points, and on every "draw a sample" click, we select samples from that same population. Is that correct? If so, should we keep highlighting the rows of the points that are drawn?
Hi Wayne, yes exactly, you want to drop a new set of points from a normal distribution with the selected mean and standard deviation. You shouldn't be highlighting points, because they all should be plotted. (There is no distinction between sample and population in this case).
HI Wayne, also, here is the rest of the mockup for the Normal Distribution simulation, including the walkthrough of a goodness-of-fit test. It has many moving parts, so we can get started and then figure out where to go from there. Simulation_NormalDistribution.docx
Got it! Thanks a lot! Just to double check, we want is that we generate a set of points (the user defines the size) that has the defined mean and standard deviation – not drawing a set from a larger population.
In this case, should all the generated points fall in the range defined by the above P(x>x1)
? Or do they fall over the entire spectrum?
Yes exactly -- draw a set of points with that mean and standard deviation. (It's really still a sample from a theoretical "potential" population, if that makes sense). The points should fall from all over the spectrum, not just the range defined by the probability statement.
It might be nice to highlight the points that do fall in that range defined by the user, since the highlighted points should be the fraction of the points shown by the probability.
Thanks! That makes sense. I've made the changes and pushed onto the branch. I'll get started on the next part.
Hi Amanda, I've pushed a basic prototype including everything up to dividing the histogram into bins. I'll keep working on the rest of it. Please let me know any changes you would like me to make! Thanks!
This looks like a good start!
I have one note on this: before dividing the data into bins, I'd like to show the points as dots, as you did above, so that the user can see the "raw" sample points.
I'm still working on the rest of the module. I've pushed some updates on displaying the raw points and letting the user decide the number of bins. I'm still working on the rest of it where we display the table, as the way Highcharts divides its data into bins is a bit tricky to work with.
Very cool! Maybe hold off on displaying the blue bins until the user inputs the number of bins they would like, but otherwise, proceed! I bet the visuals are a little tricky.
It took me some time to fix the little bugs, but I think I got it working! The table is now synced with the histogram, and also shows the expected frequency.
Amazing!! I think this is actually going to work. Here are some comments/questions:
Thanks so much for your amazing work, Wayne!
Of course! Thanks for the comments. I will proceed to work on the next steps.
I have a quick question regarding the histogram's x-axis. Right now, the data values are reflected in their y values, and their x values are merely index. If the histogram were to share the same x-axes with the data values, we would need to swap the current x and y axis for the data values. For normal distributions, this would result in the data values spread out vertically along their mean on the x-axis. Are we okay with this behavior?
Please let me know if my description makes any sense.
Hi Wayne, yes, that's right. The data values should be on the x axis, and the y-axis should represent frequencies of those data values. For a histogram with vertical bars, the x-axis values show you which values of the data points belong in which bins, and the y-axis shows you how many points fit that criterion (if that makes sense).
Got it! Thanks, Amanda! I've made the change and pushed. I'm starting the Chi-squared module.
Thanks, Wayne! The axes look right now. I think I now see an error in the generation of the uniform dots. When I plotted a sample with a mean of 0 and standard deviation of 1, I got a range of points from -10 to 10. The formula for the standard deviation of a uniform distribution is sqrt((B-A)^2/12), where A and B are the endpoints. You might want to check which parameters the given javascript function requires.
Currently we are using distribution generation functions from this library: https://statisticsblog.com/probability-distributions/#uniform. The uniform distribution generation function takes in three parameters: sampleSize, lowerBound, and upperBound. I'm not exactly sure how we can feed standard deviation to this function. Do you think we should explore functions from other libraries instead?
I think it should be possible to algebraically back out the lower bound and upper bound from the mean and standard deviation. Let me do some algebra for a few minutes (I'm procrastinating on something scary, lol).
Hi Wayne,
Here are expressions for the lower bound and upper bound using the mean and standard deviation:
Lower bound = mean - stddevsqrt(3) Upper bound = mean + stddevsqrt(3)
Give those a shot, and let's see if the picture looks more reasonable.
Ah, got it! Thanks a lot. This makes a lot of sense. The relationship between the standard deviation and the two bounds didn't click in my head for some reason. I'll give this a shot.
Hi Amanda,
It's up on the branch. Looks like the bounds are now changing correctly according to the stddev and mean. Please let me know if it looks good to you!
Awesome! Looks much more reasonable.
If you tell me what javascript function you have to figure out normal probabilities and what its inputs are, I can help you figure out the probabilities for each bin.
Sure!
(nd.cdf(bin.upperBound) - nd.cdf(bin.lowerBound)) * sampleSize
, in which nd.cdf is the library function I'm using to calculate the cumulative distribution function for the normal distribution.sampleSize / numberOfBins
for each bin.I see! Thanks! I've pushed the changes.
Lookin' good! Proceed!
Hi Amanda,
I spent some time digesting the concepts of hypothesis tests, chi-squared distribution, and p-value. I found this "Chi-square goodness-of-fit test" function from this library: https://stdlib.io/docs/api/latest/@stdlib/stats/chi2gof. I'm not entirely sure that I'm understanding it correctly, but what I have right now is that I'm feeding the goodness-of-fit test with the observed frequencies for each bin, expected frequencies for each bin, and user defined alpha. The function performs the calculation and returns the result, including the pValue and the test statistic. Depending on whether the pvalue is smaller than the alpha, we either accept or reject the null hypothesis.
The changes are on the branch. Please let me know if any of these is incorrect. Thanks!
You've done it!! I really think this is working. I'm going to check the numbers another time, but for, just a few small things.
Thanks again, Wayne! I think this will be ready for deployment and beta testing soon. Amazing!
If we really run out of stuff to do, I have an idea for a crazy simulation demonstrating why this hypothesis test works....but let's table that for now :)
Awesome! Thanks for pointing out the typos. I need to turn on spellcheck in my code editor...
Before we come up with a plan for the next module, I can pick up working on some of the long-term enhancement issues.
Sounds great!
Maybe we should try to deploy this and ask for beta testers? I can tweet it out to friends.
Sounds good! I will create the PR.
Hi Wayne,
In the latest version of the Master branch, I'm getting this error when I try to run the Normal Distribution simulation. I think this is the only one that gives me this error. I ran "npm install" just in case, and that did not fix it. Can you take a look?
Did you try npm install --legacy-peer-deps
?
Ah shoot no in my sleepy hurried state I did not! Let me try again in a bit.
Amanda G. Gregg Associate Professor of Economics, Middlebury College Join My Personal Zoom Roomhttps://middlebury.zoom.us/my/agregg?pwd=OWlGMmZMSWJaUkowRG5DUWJtRm9CQT09 (Password: EconHist) Office: Farrell House 101 Office Phone: (802) 443 - 3419 Pronouns: she/her/hers
From: Wayne Wang @.> Sent: Wednesday, August 3, 2022 11:10 AM To: ProfessorAmanda/econsimulations @.> Cc: Gregg, Amanda G. @.>; Assign @.> Subject: Re: [ProfessorAmanda/econsimulations] New Module Idea: Normal Distribution Simulation (Issue #295)
Did you try npm install --legacy-peer-deps ?
— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FProfessorAmanda%2Feconsimulations%2Fissues%2F295%23issuecomment-1204077465&data=05%7C01%7Cagregg%40middlebury.edu%7Cb444061f4e704273481908da75623ea4%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637951362083035374%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=djaGjcA9DVY8rFHE79%2BpaexqxecbOLzDOo6pKJ8nYRw%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMWGM7FJQBT5KMEJAB7OHN3VXKDUXANCNFSM52GBPX6Q&data=05%7C01%7Cagregg%40middlebury.edu%7Cb444061f4e704273481908da75623ea4%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637951362083035374%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vmAan3bQeqdhOl7NCDpZa84lnMiIbVUCJd0jBayqyoc%3D&reserved=0. You are receiving this because you were assigned.Message ID: @.***>
Hi Wayne, can you take a look at the expected frequencies for the case where the underlying data happens to come from a chi-square distribution? I think they might still be calculated incorrectly. Thanks!
Sure. Working on it!
Hi all,
This is just the beginning kernel of an idea, but if Wayne has time, perhaps he can start to mock this up while I scratch my head about all of the features I want to include.
The basic idea is to provide a playground for students to experiment with how the normal distribution describes datasets and determines probabilities. Students especially often struggle with the connection between the value on the axis and the area under the normal curve. For now, let's focus on the normal distribution, but really this could generalize to any continuous probability distribution.
Wayne, you will want to familiarize yourself with the normal distribution and with javascript's functions for drawing normal density functions, for finding the area under the curve given a value on the axis, and for finding a value on the axis given a probability (this would be an inverse normal probability distribution). I believe we use these in various modules already.
To learn about how the normal distribution works conceptually, check out Chapter 4 of OpenIntroStats: openintro-statistics.pdf
My sketch in progress: Simulation_NormalDistribution.docx