UNCG-CSE / Library-Computer-Usage-Analysis

The University Libraries at UNCG currently track the state of a computer, determining whether or not a particular computer is in use. This data is compiled into a database, and a web app pulls from this database to show a map and number of available computers. As of Fall 2017, the data had not been used to determine which computers are used more frequently, aside from counting the number of times a computer transitions into/away from the 'in-use' state. This project attempts to correlate the usage of these computers with various factors, including: campus scheduling, equipment configuration, placement, population in the library, and area weather. Using this data, this project also uses machine learning to determine the best placement of computers for future allocation, and possible reconfiguration of equipment and space.
1 stars 1 forks source link

Data Visualization #37

Closed brownworth closed 6 years ago

brownworth commented 6 years ago

This is beginning a discussion on what is needed for the visualization within this project.

brownworth commented 6 years ago

I'll be working on visualization on the Library Stats and @PatriciaTanzer has offered to look into visualization on the weather stats.

smindinvern commented 6 years ago

Ok, cool, thanks. I've been planning on doing a correlation, frequency, trend analysis etc on the aggregate data set.

On October 12, 2017 7:44:08 AM EDT, Brown Biggers notifications@github.com wrote:

I'll be working on visualization on the Library Stats and @PatriciaTanzer has offered to look into visualization on the weather stats.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/UNCG-CSE/Library-Computer-Usage-Analysis/issues/37#issuecomment-336102942

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

smindinvern commented 6 years ago

Speaking of which, some work done in the nick branch.

PatriciaTanzer commented 6 years ago

I did NOT mean to commit to master just now! I though I was committing to my branch....fixing that now

PatriciaTanzer commented 6 years ago

Ran out of time, have to go to a class- will continue trying to revert in an hour or so

smindinvern commented 6 years ago

I just added a heatmap of correlation of individual hourly usage as well as hourly temp, wind, and rain. permalink. The image is quite large, so it helps to right-click->view image to zoom in on it.

You can see that there are several blocks that show high levels of correlation (+ve or -ve) along the main diagonal, which seem to correspond to (I assume) computers that are grouped together. Also possible of note are the small groups which are inversely correlated with the vast majority of the rest of the computers.

Looking along the bottom at the correlation between e.g. precipitation and each computer's usage, most have near-zero correlation, but there are a few that stand out a bit, and I'm wondering if there's anything significant about those computers in particular.

I'm planning to break this into smaller chunks--one for each group of like-named computers (those that share a prefix, which I assume means similarly located) to make it easier to view and focus on specific groups. Also will be looking at the table of attributes to see if there are any differences which show along the lines of e.g. quiet area or not, two monitors or one, window or none, etc.

Anyway, just FYI, food for thought, RFC, etc.

Cheers

smindinvern commented 6 years ago

Aaaaand I split the correlation matrix out into separate groups along the main diagonal. That's in the tip of nick.

brownworth commented 6 years ago

If you look at the characteristics in ./data/computerAttributes.csv, you'll see a number of grouping characteristics to correlate by. For example, machines can be grouped by floor, or by other things (dual monitors, etc.). But, you are right, the code at the front (MLC-, INC-, CITI-) usually dictates an area.

Also, moving forward, we should probably exclude the machines who have a 0 in 'requiresLogon'. Those are machines who are definite outliers on the usage charts, as they are technically always "logged in".

I'm working on the usage chart from the other day to change the graph based upon arbitrary dates. I have the functionality in there to change the graph, but I would like to tie it to a RangeSlider widget.

Also, I'm going to be working on gate count with a rolling mean of 7 days and correlating it with weather patterns.

brownworth commented 6 years ago

I've described the characteristics in ./docs/computerAttributes.md. There were a couple of things that I edited for clarity in the description document, so hopefully this makes sense.

smindinvern commented 6 years ago

See commit 98f591d for some hypothesis testing:

  • Restrict analysis to computers which require logon.
    • Data from machines not requiring a logon is essentially useless, since the machine is always on, so there's no change of state for us to analyze.
  • Restrict to records occuring during regular semester dates.
  • Split data into groups and do t-test to quantify difference in usage between groups.
    • Rainy vs. not-rainy hours.
    • With and without adjacent window, during rainy times.
brownworth commented 6 years ago

@smindinvern I'll take a look at this in the morning. I've been pretty much in bed (with the exception of voting, of course) since I left campus yesterday. Thank you for taking a look at these.

brownworth commented 6 years ago

@smindinvern I just looked at your heatmaps, and they look great! I think you have pretty much covered the weather correlations. I like that you have a correlation between rainy days and machines near windows. That's a neat segment to add.

A couple of pieces of information to add that may be helpful in our interpretation:

Moving forward, I'm working with the usage statistics for computer attributes and see if I can come up with something to compare usage to an attribute.

brownworth commented 6 years ago

I've figured out what was causing the indexing problem and in doing so, I changed the format of the graph. The file: src/AvgPercentUtil.html in the brown branch shows the mouseovers and zoom functionality as before. avgpercentutil

smindinvern commented 6 years ago

@brownworth that looks a lot better. Much more easily interpreted than the circles. Looks awesome!

brownworth commented 6 years ago

Thanks! Admittedly, I was kinda bummed the circles thing didn't work out. I was hoping to show more nuance with the radii of the data points, but then again - there's a reason why data visualization typically hasn't been done this way before.

PatriciaTanzer commented 6 years ago

Looks like we have all our visualization atm, closing this issue