GCIS Metrics - analyzing existing records

rasherman commented 6 years ago

The existing metadata scoring script needs to be run on many more chapters of CSSR, CHA, and NCA3. It is probably too hefty a task to do ALL chapters, but we'd like:

Metrics for as much of CSSR as possible.
Enough representative chapters of NCA3 and CHA to get a sense of whether we've significantly improved over time.

Once the scripts are done, doing pretty visualizations is straightforward, but I would also like to see:

Some statistics on score distribution for different types of records.
Some statistics on average score by "depth" (number of steps down from a USGCRP report chapter).

This will require a bit of training on Kat's part to allow Amrutha and/or Reuben to run the metrics script, but afterwards we can get those running and then focus on results analysis.

lomky commented 6 years ago

I think it's worth thinking about which chapters to look at for NCA3? Making sure we get examples from each type (i.e. regional, topic area).

lomky commented 6 years ago

Detailed directions for getting up and running have been documented on the Provenance Evaluation repo

amruelama commented 6 years ago

@rasherman Running all the chapters (including front page, appendix etc.,) is not an hefty process as each chapter in CSSR took less than 10 mins to generate scores. But running an entire report on the same server is tricky due to memory issues. The first time we ran scores script for the entire report killed the process. However, as @lomky suggested, we are going to try with criteria such as 'depth level' and see if that helps.

Once that's done, I'll be working on comparison and analysis.

Related tickets are referenced.

USGCRP / gcis

GCIS Metrics - analyzing existing records #643