ProfessorAmanda / econsimulations

This repository provides the code underlying simulations for teaching statistics and econometrics. The project site, which includes teaching materials as well as the link to the simulations, is located here: https://amandagreggeconomics.com/statistics-simulations-project/
Other
1 stars 0 forks source link

New Module: Measurement Error #282

Open ProfessorAmanda opened 2 years ago

ProfessorAmanda commented 2 years ago

Get started on a new module on measurement error! Here are some initial steps:

waynew99 commented 2 years ago

MeasurementErrorSketch.pdf

Progress update – I tried to come up with a very basic layout based on my understanding of the concept. Please let me know if you think it's the right direction!

Imagined user experience:

  1. The user uses the slider to indicate how many points are wanted. Click the "new points" button. Generated points are shown with the regression line.
  2. The user uses the next two sliders to define error range in X and Y direction, then clicks "introduce error" button. New points with errors are shown, with a new regression line.
  3. (The user gets some quantified information on screen? Maybe the slope of the regression line?)
waynew99 commented 2 years ago

Hi Professors,

More progress update – I built a very basic prototype in the branch measurementError. It would be very helpful if I can look at the Stata code before moving forward so that I can verify I'm on the right direction.

Please let me know if there's issue accessing the branch, or if you would like a zoom call so I can demo it and we can talk about it. Thanks!!

Best, Wayne

tbyker commented 2 years ago

clear all set seed 8675308 set obs 10

gen point=_n gen x=_n gen error=runiform(-1,1)100 gen error=runiform()*100

gen y=2-20*x+error

gen meas_error=runiform()30 gen sign=runiform(-1,1) gen mis_x=x+meas_errorsign

scatter y x, mlabel(point) mcolor(green) name(orig, replace) xscale(range(-10 30)) xlabel(-10(20)30) legend(off)

scatter y mis_x, mlabel(point) mcolor(orange) name(mis, replace) xscale(range(-10 30)) xlabel(-10(20)30) legend(off)

twoway (scatter y x , mlabel(point) mcolor(green)) (lfit y x ), name(orig_line, replace) xscale(range(-10 30)) legend(off)

twoway (scatter y mis_x , mlabel(point) mcolor(orange)) (lfit y mis_x), name(mis_line, replace) xscale(range(-10 30)) legend(off)

twoway (scatter y x , mlabel(point) mcolor(green)) (scatter y mis_x, mlabel(point) mcolor(orange)), name(both, replace) xscale(range(-10 30)) legend(off)

twoway (scatter y x , mlabel(point) mcolor(green)) (lfit y x) (scatter y mis_x, mlabel(point) mcolor(orange)) (lfit y mis_x), name(both_line, replace) xscale(range(-10 30)) legend(off)

reg y x

reg y mis_x

ProfessorAmanda commented 2 years ago

Looks like we are getting close to a deployable version already! This still needs:

waynew99 commented 2 years ago

Thanks for the comments! I've added the display of regression line equations which show / hide in sync with the lines in the chart. However, I'm a bit unsure about the term / notation that should be used here. Is f(x) okay in this context?

Screenshot from 2022-06-14 15-01-57

waynew99 commented 2 years ago

Hi Professor Byker,

Based on our discussion yesterday, I was able to produce a prototype for the second part of Measurement Error. It can be accessed from branch measurementError.

However, I discovered that when I set the parameters to 1000 data points with error in the X-direction, randomly selecting 100 points per sample(iteration), and conducting about 100 iterations, the regression lines from each iteration are marked as the grey ones and their slopes do not seem to lean towards 0, which is not what we expected. Am I making some apparent mistake here? We can quickly run through it after tomorrow's meeting if it works for you. Thanks a lot!

Best, Wayne

Screenshot from 2022-06-16 17-05-24

tbyker commented 2 years ago

Hi Wayne, As an initial test, can you start with an "original data regression" that has a strongly positive (or negative slope)? Like generate a dataset with a slope of +1 or even +2, then produce the error data from that, and sample. Does that make sense?

Tanya


Tanya Byker

Associate Professor, Economics

Middlebury College @.**@.>



From: Wayne Wang @.> Sent: Thursday, June 16, 2022 5:15 PM To: ProfessorAmanda/econsimulations @.> Cc: Byker, Tanya S. @.>; Assign @.> Subject: Re: [ProfessorAmanda/econsimulations] New Module: Measurement Error (Issue #282)

Hi Professor Byker,

Based on our discussion yesterday, I was able to produce a prototype for the second part of Measurement Error. It can be accessed from branch measurementError.

However, I discovered that when I set the parameters to 1000 data points with error in the X-direction, randomly selecting 100 points per sample(iteration), and conducting about 100 iterations, the regression lines from each iteration are marked as the grey ones and their slopes do not seem to lean towards 0, which is not what we expected. Am I making some apparent mistake here? We can quickly run through it after tomorrow's meeting if it works for you. Thanks a lot!

Best, Wayne

[Screenshot from 2022-06-16 17-05-24]https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F49931122%2F174165013-3ad40112-6b5e-4643-a78a-38f798ff879d.png&data=05%7C01%7Ctbyker%40middlebury.edu%7C225d2ae2b96e4e173fea08da4fdd6198%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637910109494569000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OpF3FwCn%2FKbR%2BAo7ikflEjKueBiM%2Bxg2xK1edaKsmcc%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FProfessorAmanda%2Feconsimulations%2Fissues%2F282%23issuecomment-1158137065&data=05%7C01%7Ctbyker%40middlebury.edu%7C225d2ae2b96e4e173fea08da4fdd6198%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637910109494569000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8TNYkwmVNZtbowp8SrY%2B5CDcB50EcavM1TxPgEzIcY8%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAP65VZ6IXRN2AUT76V5MTZLVPOKQFANCNFSM5X7RR4KQ&data=05%7C01%7Ctbyker%40middlebury.edu%7C225d2ae2b96e4e173fea08da4fdd6198%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637910109494569000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Bqo3EvDSM8t%2Fanc2qK8PT1jW5Z9keK5ElpUnELBDk7g%3D&reserved=0. You are receiving this because you were assigned.Message ID: @.***>

waynew99 commented 2 years ago

Got it! Thanks! I've been trying to find a way to generate points with some randomness but have a given regression line. It's taking a bit more time than I expected, but I'll keep working on it.

tbyker commented 2 years ago

Could you just start with a test on a "fixed" set of data?


Tanya Byker

Associate Professor, Economics

Middlebury College @.**@.>



From: Wayne Wang @.> Sent: Thursday, June 16, 2022 10:30 PM To: ProfessorAmanda/econsimulations @.> Cc: Byker, Tanya S. @.>; Assign @.> Subject: Re: [ProfessorAmanda/econsimulations] New Module: Measurement Error (Issue #282)

Got it! Thanks! I've been trying to find a way to generate points with some randomness but have a given regression line. It's taking a bit more time than I expected, but I'll keep working on it.

— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FProfessorAmanda%2Feconsimulations%2Fissues%2F282%23issuecomment-1158418769&data=05%7C01%7Ctbyker%40middlebury.edu%7C2de8959275da47aa4b9308da50095b46%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637910298381990837%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PP4O4pcNPlptRA5pJuFalMkh1G78Aws0g5AspNMZKqQ%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAP65VZ742GPHDIAURWRDGPDVPPPMTANCNFSM5X7RR4KQ&data=05%7C01%7Ctbyker%40middlebury.edu%7C2de8959275da47aa4b9308da50095b46%7Ca1bb0a191576421dbe93b3a7d4b6dcaa%7C1%7C0%7C637910298381990837%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jiYZw6qAZ48KlaOUfAvFmQmII6CND05IdGwtfBlDTHk%3D&reserved=0. You are receiving this because you were assigned.Message ID: @.***>

waynew99 commented 2 years ago

I think I got it work! I first plot the points on a line with a slope of +1, and then move them around randomly by a small amount. Now, with errors introduced in the X-direction, the slope indeed decreases! And with error in the Y-direction, the slope changes to both directions. I will try to implement this to the first part of the module as well, to ensure we have a large enough slope.

Screenshot from 2022-06-16 23-35-31 Screenshot from 2022-06-16 23-36-00

Do we want to show any equations related to this plotting?

waynew99 commented 2 years ago

Progress update: I've added the assurance of a strong slope to generated points to both parts, as well as input validation for the second part(make sure the number of iterations is between 1 and 100). Please let me know if there's anything that I should fix and tweak. Thanks!

Screen Shot 2022-06-20 at 15 23 24