UNCG-CSE / Library-Computer-Usage-Analysis

The University Libraries at UNCG currently track the state of a computer, determining whether or not a particular computer is in use. This data is compiled into a database, and a web app pulls from this database to show a map and number of available computers. As of Fall 2017, the data had not been used to determine which computers are used more frequently, aside from counting the number of times a computer transitions into/away from the 'in-use' state. This project attempts to correlate the usage of these computers with various factors, including: campus scheduling, equipment configuration, placement, population in the library, and area weather. Using this data, this project also uses machine learning to determine the best placement of computers for future allocation, and possible reconfiguration of equipment and space.
1 stars 1 forks source link

Computer Usage per hour #23

Closed PatriciaTanzer closed 6 years ago

PatriciaTanzer commented 6 years ago

A function?

Find which computers are used most often for any given hourly period: Find what total percentage of computers are in use for any given hourly period

THIS IS MAJOR. Most of our weather data is in hourly format, so this is our main comparion

brownworth commented 6 years ago

I have a function in place that will get this for a day's worth of data. I will test this with an entire semester soon.

smindinvern commented 6 years ago

@PatriciaTanzer are these the usage stats you were talking about this morning?

Find which computers are used most often for any given hourly period: Find what total percentage of computers are in use for any given hourly period

What we currently have is two different methods of calculating hourly usage for each computer, see:

@brownworth where can we find your implementation of the above-mentioned stats? Are they in master at this point?

Are there other stats that you guys think would be useful? Maybe also aggregate utilization per {day,week,month,dayofweek}?

brownworth commented 6 years ago

I had stopped pursuing development on my methods of extracting this data after we were able to see that your code was faster. As for my implementation of these stats, I haven't done anything with usage stats at this point. I have only done some preliminary visualization on library gate counts v. weather.

Moving forward, I don't know if it makes sense to calculate the usage stats more than once. There's going to be a lot of overhead from the calculations, so it may make sense to have it aggregated down into a single CSV of hourly datestamps as rows, machine name as columns, cells as percent used (or minutes) for import later. But that may be for a much later date, as we haven't even begun to talk about big data constructs in class, and the data for this is 29 files of nearly 100k lines per.

Have you done any testing of the SQLLite vs. Pandas for performance?

smindinvern commented 6 years ago

Hmm, good point. We could stuff it in a database if we really need the query performance, or like you said, a flat CSV file (which would be fairly large). The easiest might be to just pickle the data?

As to your question, no I've not looked at it. I guess that begs the question what are our requirements that we would be evaluating pandas vs. sqlite for?

smindinvern commented 6 years ago

Also, a point about large files--that shouldn't pose a problem. See git-lfs. So the CSV option wouldn't necessarily be a bad one.

smindinvern commented 6 years ago

Alright, I've imported and pickled the weather and library data. If we want to move to e.g. sqlite in the future that should be no problem. The data is at least now easily accessible in a format we can work with.

smindinvern commented 6 years ago

Looks like this is done and in a state that everyone is OK with.