aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
437 stars 192 forks source link

Verdi command to gather user statistics #3692

Closed ramirezfranciscof closed 6 months ago

ramirezfranciscof commented 4 years ago

This is an issue to discuss the possibility of adding a verdi command (verdi stats) so that the user can generate a text file with information of their AiiDA usage that developers could then use to make better informed decisions when improving the code. This originally came up when thinking how to design and test possible repository backends with @espenfl , and we noticed that even if we asked users for information on the size of their files they wouldn't know how to provide it, and even if they did it might be annoying to do.

The idea would be to output all the gathered information in a machine-readable text file (json) that the user can inspect to check exactly what information he would be sending us (and thus these should never become gigantic intelligible walls of text). It would also be good to have this as a separate plugin, so when new information is needed, the plugin can be updated without needing to make an aiida-core release.

If any of you have ideas for what kind of information would give better direction to you development projects, please comment here.

ramirezfranciscof commented 4 years ago

For repository development, it could be nice to know the distribution of filesizes and repo_size/nodes or total_files/nodes. Also maybe some information on the frequency of access to these files (and how often the same ones are accessed), but this could prove to be more difficult since its more of a "flow measurement" than a "state measurement".

broeder-j commented 4 years ago

Comments:

  1. folder content can be very different, there might be many empty folders and some which are quit full.
  2. The repo size on disk depends on the disk page size.
  3. Instead of frequency of access one could construct some (static)metric from the Database like connectivness to certain node types.
  4. access frequency under production can be very different from 'average access frequency'
  5. personally I would be interested in 'doublicate' information, which is probably hard to extract.
  6. as a user it would be nice to know if the repo is consistent with the correspondent database.

As a start one could write something that creates distributions from parsed output of the 'du' command.

It would interesting to know if users start to use the repository through other non related AiiDA tools. For a visual overview: On mac there is grandperspective http://grandperspectiv.sourceforge.net/ for linux and windows there are similar tools.

giovannipizzi commented 4 years ago

Also mentioning we started a very minimal tool here: https://github.com/ltalirz/aiida-statistics-query with @ltalirz - with a slightly parallel goal (getting statistics on the usage of AiiDA in published projects, getting at least the number of node types), but this could be a starting point

sphuber commented 6 months ago

Currently there is verdi storage info which gives a lot of information about storage contents. I am closing this for now. If there is additional information needed, a new issue can be opened.