UNCG-CSE / Library-Computer-Usage-Analysis

The University Libraries at UNCG currently track the state of a computer, determining whether or not a particular computer is in use. This data is compiled into a database, and a web app pulls from this database to show a map and number of available computers. As of Fall 2017, the data had not been used to determine which computers are used more frequently, aside from counting the number of times a computer transitions into/away from the 'in-use' state. This project attempts to correlate the usage of these computers with various factors, including: campus scheduling, equipment configuration, placement, population in the library, and area weather. Using this data, this project also uses machine learning to determine the best placement of computers for future allocation, and possible reconfiguration of equipment and space.
1 stars 1 forks source link

To import, or not to import #49

Closed smindinvern closed 6 years ago

smindinvern commented 6 years ago

I think that we had some marginal consensus that creating import libs is fine, under certain circumstances and with some conditions? Please correct me if I'm mistaken.

I'd suggest the import routines be broken out so that they can be used by multiple notebooks, as we all seem to be working mostly in separate notebooks at this point. Other than that there doesn't seem to be a whole lot of code that would benefit from this kind of modularization.

PatriciaTanzer commented 6 years ago

Can you give an example of what you mean about breaking out import routines?

smindinvern commented 6 years ago

@PatriciaTanzer sure, I have an example here.

This pulls in LibDataImport (see here) and weather (see here). The idea is that UsageVsWeather.ipynb and LibraryData.ipynb can both use the same code to import the library data and compute hourly usage. UsageVsWeather.ipynb and GSOWeather.ipynb can both use the same code to import the weather data, clean it, and parse all of those status codes. The same thing would apply to the gate count data.

Does that make sense, or is there anything that I could focus on clarifying?

brownworth commented 6 years ago

I don't know if there's a benefit to repeated transformations from the raw data to a table of percent utilization per hour. It seems once the data (library and weather) has been transformed into the correct format, it would be more efficient to run calculations on the transformed data, rather than transforming it each time. But this will ultimately depend on the storage mechanism.

As such, they can certainly be used within the parts of the scripts that transform the data.

With regard to utils.py, I must admit, I'm not fully comprehending what it does. And, I'm not sure that we'll be necessarily working with Matplotlib as our exclusive imaging library. Personally, I've been wanting to learn Bokeh for a while now, and I'm using this project and Assignment 3 to learn more. It may be that Matplotlib is a more efficient/effective tool for our purposes, in which case, we can revisit the necessity.

smindinvern commented 6 years ago

Alright, as mentioned in #23, I've pickled the data for quick import. Hat-tip to @brownworth for pointing out the silliness of re-importing from raw data over and over again ;-)

I've also removed utils.py because it looks like it probably won't be used--nbd.

At this point can we all agree on leaving the import and pickling code as-is? Since, nominally speaking, it won't be used again, it would just be for the purpose of demonstrating how we imported the raw data. Or does anyone want those .py files removed now that we have a persistent store of the transformed data?

brownworth commented 6 years ago

I think keeping the code somewhere visible is a good idea. It's one of the more significant segments of code we have at this point, and it's definitely good to have it somewhere for grading purposes.

smindinvern commented 6 years ago

Alright. I'm going to close this issue then, since it sounds like nothing is in need of changing right now. If anyone feels this is still an issue, please reopen.