Ameobea / network_database

Web application to hold networks in a centralized location for processing and analysis.
0 stars 0 forks source link

Whole-database stats distributions and scatter plots #13

Open Ameobea opened 8 years ago

Ameobea commented 8 years ago

Scripts to automatically generate stats from all networks in the database as well as the correlations plots should be created. It is likely that dynamically generating these things for users as they come will be too much work for the server, so regularly generated and cached pages is probably the best bet. Perhaps it can be generated each time a network is imported, or on a regular time schedule via cron job or something.

Interactive scatter plots made on the clientside using Highcharts would probably be the best option. They could have options to show/hide node labels, enable node coloring by tag, etc.

For overall distributions (such as those that will be used on the screener page), it's possible that they can be generated dynamically. However, I think it would be much easier to use cached data generated each time a network is uploaded to the database instead.

Since all of these new calculations will be added, it seems logical to add this step to the jsonToMongo script or tack it on at the end of the processing pipeline at some point. This would make integrating them with the site easier and keep the process streamlined.

However, as was mentioned during the meeting, it's possible that stuff will have to be kept separated due to server concerns with hosting on Valpo Scholar. It will still have to have access to the data files from the processing script and the database in any case though, so that probably won't be an additional problem.

Ameobea commented 8 years ago

I think it may be very cool to link up the scatter plots with the screener. That would go a long way to make the database feel like a useful tool for network analytics and comparison, allowing users to visualize the results of their queries and analyze them.