Closed PatriciaTanzer closed 6 years ago
I have a function in place that will get this for a day's worth of data. I will test this with an entire semester soon.
@PatriciaTanzer are these the usage stats you were talking about this morning?
Find which computers are used most often for any given hourly period: Find what total percentage of computers are in use for any given hourly period
What we currently have is two different methods of calculating hourly usage for each computer, see:
@brownworth where can we find your implementation of the above-mentioned stats? Are they in master
at this point?
Are there other stats that you guys think would be useful? Maybe also aggregate utilization per {day,week,month,dayofweek}?
I had stopped pursuing development on my methods of extracting this data after we were able to see that your code was faster. As for my implementation of these stats, I haven't done anything with usage stats at this point. I have only done some preliminary visualization on library gate counts v. weather.
Moving forward, I don't know if it makes sense to calculate the usage stats more than once. There's going to be a lot of overhead from the calculations, so it may make sense to have it aggregated down into a single CSV of hourly datestamps as rows, machine name as columns, cells as percent used (or minutes) for import later. But that may be for a much later date, as we haven't even begun to talk about big data constructs in class, and the data for this is 29 files of nearly 100k lines per.
Have you done any testing of the SQLLite vs. Pandas for performance?
Hmm, good point. We could stuff it in a database if we really need the query performance, or like you said, a flat CSV file (which would be fairly large). The easiest might be to just pickle the data?
As to your question, no I've not looked at it. I guess that begs the question what are our requirements that we would be evaluating pandas vs. sqlite for?
Also, a point about large files--that shouldn't pose a problem. See git-lfs. So the CSV option wouldn't necessarily be a bad one.
Alright, I've imported and pickled the weather and library data. If we want to move to e.g. sqlite in the future that should be no problem. The data is at least now easily accessible in a format we can work with.
Looks like this is done and in a state that everyone is OK with.
A function?
Find which computers are used most often for any given hourly period: Find what total percentage of computers are in use for any given hourly period
THIS IS MAJOR. Most of our weather data is in hourly format, so this is our main comparion