Need to get organized on next phase of our project

profgeraci commented 7 years ago

Hi, everyone. We should probably start talking about what we want to do for the next phase of the project. Prof. Ackleman mentions in tonight's Q&A that we should be working on it. Shall we plan to get together on Hangout sometime later this week or this weekend?

I've started to think about possible models and it would be probably best if we could all do some modeling on our own, then bring our work to the group and brainstorm ideas.

@JestonBlu @nitroys @rmglazner

rmglazner commented 7 years ago

Hello! I think having a meeting is a great idea, but I will be out of town for field work Thursday-Monday, so the earliest that I can meet is next Tuesday. I might have very limited internet access where I am going, so I am happy to upload model ideas next week as well (rather than this weekend). Sorry for the delays because of scheduling!

nitroys commented 7 years ago

I also will be traveling Thursday-Sunday, but am also happy to run some models and discuss them next week!

Thanks for the jump on this Anne. It's tempting to not think about this when the due date is still far away :)

JestonBlu commented 7 years ago

Next Tuesday is good with me if that works for everyone. Ive been playing with the data a little bit, but I think doing something like logistic regression is going to be a little difficult. We we did it on the Recommend Category and recoded "Not Likely to Recommend" as a 0 and "Likely or Very Likely" as a 1, you would only get about 15 0s to 184 1s. Thats pretty lopsided. Depending on how everyone wants to approach analyzing the data it may be worth looking at another dataset that is appropriate for logistic regression.

What does everyone think?

rmglazner commented 7 years ago

If we changed data, one idea that could be applicable to logistic regression would be admissions statistics: http://stats.idre.ucla.edu/r/dae/logit-regression/

I have attached the data file in the code of GitHub and titled it "Logistic Regression Example Data." The data was directly from the website above.

The data created is hypothetical, and Dr. Akleman also wants groups to have at least four predictors, so I don't know if it is a good fit. If someone has similar data from a better source with more predictors, we could work with that too.

rmglazner commented 7 years ago

Here is a very large list of possible datasets we could use too: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html

rmglazner commented 7 years ago

Sorry for the multiple posts, but here is a list of usable data that is directly relevant to logistic regression:

https://www.umass.edu/statdata/statdata/stat-logistic.html

nitroys commented 7 years ago

Hmm I see. I think if the data is that unbalanced, it's a good idea to look at a different dataset. I'm open to any of the sets Rachel has posted here, but I do think we should use a real dataset rather than a hypothetical one.

profgeraci commented 7 years ago

I agree, Shannon. I looked at the UMASS and R-manual datasets and they seem kind of like "academic" examples of using logistic regression. Anyone know of some other real-world data that we might use? Perhaps this would be a good question to ask of Prof. A?

rmglazner commented 7 years ago

I agree that real-world data would be better as well. Looking at the UMASS datasets, the file "benign" seems to be from actual data (even though it is also used for academic purposes here): "Trained interviewers administered a standardized structured questionnaire to collect information from each subject [see Pastides et. al. (1983) and Pastides, et al. (1985)]."

The citations are listed at the bottom of the page and are from medical sources. What is everyone's thoughts about this dataset?

JestonBlu commented 7 years ago

I would be okay with that, but before we decide, do you have any data related to your field work that we could use Rachel?

profgeraci commented 7 years ago

I feel really nervous building our project around this benign data. It looks dated (over 20 years old) and I think we're just opening ourselves up to an accusation that we "stole the model" from some journal article that we don't yet know about it. I searched on JSTOR for anything related to this and didn't get any hits.

Once again, I'll suggest that we present this situation to Prof Ackleman and ask for her advice. Can someone compose a couple of sentences that summarize this?

I'm not sure I understand exactly the reason why we don't thing the NWROC data will work, so I hesitate to write something myself. Any thoughts?

JestonBlu commented 7 years ago

I see your point. Let me clarify that I think logistic regression will work on the data, but I dont think we are going to see any significant predictors, if thats our main focus. We might try looking at some of the other responses that were more evenly distributed. Ill play around with it tonight and see if I can come up with some ideas. Generally speaking I agree that a data set that old might open us up to some criticism.

On Thu, Apr 6, 2017 at 11:19 AM, NancyDrew484 notifications@github.com wrote:

I feel really nervous building our project around this benign data. It looks dated (over 20 years old) and I think we're just opening ourselves up to an accusation that we "stole the model" from some journal article that we don't yet know about it. I searched on JSTOR for anything related to this and didn't get any hits.

Once again, I'll suggest that we present this situation to Prof Ackleman and ask for her advice. Can someone compose a couple of sentences that summarize this?

I'm not sure I understand exactly the reason why we don't thing the NWROC data will work, so I hesitate to write something myself. Any thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JestonBlu/Neighbor-Works/issues/14#issuecomment-292226884, or mute the thread https://github.com/notifications/unsubscribe-auth/ADL2hZ-KcOH9JG6AcCtFGwZn4UD-VP0Vks5rtRB0gaJpZM4MzkrE .

profgeraci commented 7 years ago

OK, I see what you are saying, Joseph. So, we would fail to find anything statistically significant. I agree that's not a very interesting presentation.

I have another idea. One of my colleagues suggested looking at this website:

http://www.thearda.com/Archive/Files/Downloads/GSS2014_DL2.asp or this one: http://gss.norc.org/get-the-data (Same data from both)

.... so I extracted this data file:

GSS2014.xlsx

containing over 4000 responses to a nation-wide Social Survey from the year 2014. The site has many different years (if we were interested in time-series data), but I think this might work.

We could, for example, come up with a hypothesis about a binary variable like, say, HAPMAR (Taking things all together, how would you describe your marriage?) and do a logistic regression on it. In this case, I think the 0's indicate "Not Married" and the 1 and 2 are the responses, but I would have to do some more digging into the data.

Then we could find all sorts of variables that might be good predictors. What do you think?

JestonBlu / Neighbor-Works

Need to get organized on next phase of our project #14