css-research-sp19 / fp-bhargavvader

fp-bhargavvader created by GitHub Classroom
0 stars 1 forks source link

Data/methods #10

Open bensoltoff opened 5 years ago

bensoltoff commented 5 years ago
bhargavvader commented 5 years ago

Thank you @bensoltoff , good points!

So a huge part of the project is in setting up the data. This kind of data source (and the cleaning and organising involved) means that a lot of the work is grunt work - the interesting analysis is only trickling around now. A large part of my paper is going to be description and curation of the data - would this be ok? I've seen a few papers which spend a good amount of time describing a new or potentially new way of using old data - I will hope to be doing a bit of that. (for the record, it is ~55 gb of syllabi data, and ~29 gb of research paper data - and all only text!)

The major analysis will be a similarity assessment, but both spatially and temporally. Because I could only do this for Texas in my primary analysis (though as I type this the entire US dataset its being analysed) is a little bland. Working on this!

It is (as of now) a largely descriptive paper, but I hope to build on this (for my thesis) to add more predictive elements. The major major hurdle was creating the vectors for each city and each organisation across the USA. We have a lot of hypothesis to test, but only possible after setting up the RCC environment which has the capacity to do all this number crunching. This is 99% done now - hopefully the results will be interesting enough that the paper will write itself!