CUL-DigitalServices / avocet-ui

Open Academic Environment (OAE) Front-End
http://www.oaeproject.org
Educational Community License v2.0
1 stars 1 forks source link

Create formula to calculate metric from OA Helpdesk spreadsheet #375

Closed micheleidesmith closed 9 years ago

micheleidesmith commented 9 years ago

Our main project metric for Open Access is:

To collect 50% of the University output of journal articles and conference proceedings by end-January 2015 (8,000 articles x 50% / 12 months = 333 articles a month, or 15 a day).

We already collect the number of manuscript submissions to Avocet using a Google Analytics event. But this includes retrospective submissions (submitted outside the 3 month HEFCE window since acceptance date), duplicate submissions and test submissions.

Using Philip's tracking spreadsheet we could write a forumla [1] to track the number of submissions that are relevant to our goal with a formula that meets the following criteria:

  1. Submission date is < = 3 months from acceptance date
  2. Not a duplicate
  3. Not a test

It would be useful to collect this metric as above, then apply one additional criteria:

  1. Manuscript is an AAM

An example of Philip's spreadsheet with recent data is here: https://www.dropbox.com/sh/r9kavz8v19bkd8p/AAD7WM2aFkNQzxsV-eRNBf6ya?dl=0

[1] Longer term we aim to collect this via ZenDesk, but John is keen that we start collecting something now.

ucamhal commented 9 years ago

OK, I've implemented a formula to calculate this. I've used LibreOffice as I've not got Excel on my computer, but it should translate to Excel with minimal if any changes.

It's an Array formula, it needs to be entered with Ctrl+Shift+Enter in the formula bar (rather than just Enter as a normal formula would).

The following named ranges need to be defined before entering the formula:

Here's how the named ranges look for me: screen shot 2014-10-07 at 15 43 41

It's important that the number of rows in each named range is the same. These ranges can extend to empty areas, the formula ignores rows with empy dates. The ranges will need to be extended as more entries are made. Defining these ranges outside the formula reduces the risk of breaking the formula by mistake while updating ranges and makes the formula easier to read.

Hold your breath...

# Count the total number of rows where the following conditions are TRUE (1)
# Logical operations can't be used in array formulas, so multiplication on 0/1
# is used instead which serves the same purpose: AND(1, 0) is equivilent to 
# (1 * 0).
=SUM(
    # Require the duplicate row to be empty and the department not to be "Test"
    ( ISBLANK(Duplicates) )
    *
    ( Departments <> "Test" )
    *
    # Ignore rows with blank submission date...
    NOT(
        ISBLANK(
            SubmissionDates
        )
    )
    *
    # ... or blank acceptance date
    NOT(
        ISBLANK(
            AcceptanceDates
        )
    )
    *
    # This is the date calculation. FLOOR() serves to strip any time in the
    # date value.
    # We check that the submission date is less than or equal to the
    # acceptance date plus 3 months.
    (
        FLOOR(
            SubmissionDates,
            1
        ) <=
        DATE(
            YEAR(
                AcceptanceDates
            ),
            MONTH(
                AcceptanceDates
            ) + 3,
            DAY(
                AcceptanceDates
            )
        )
    )
    *
    # We also check that the submission date is after the acceptance date
    (
        FLOOR(
            SubmissionDates,
            1
        ) >=
        FLOOR(
            AcceptanceDates,
            1
        )
    )
    *
    # Finally, check that the version is "AAM"
    ( Versions = "AAM" )
)

Compact Versions

Only AAMs: =SUM((ISBLANK(Duplicates)) * (Departments<>"Test") * NOT(ISBLANK(SubmissionDates)) * NOT(ISBLANK(AcceptanceDates)) * (FLOOR(SubmissionDates, 1)<= DATE(YEAR(AcceptanceDates), MONTH(AcceptanceDates) + 3, DAY(AcceptanceDates))) * (FLOOR(SubmissionDates, 1)>=FLOOR(AcceptanceDates, 1)) * (Versions="AAM"))

All manuscript types: =SUM((ISBLANK(Duplicates)) * (Departments<>"Test") * NOT(ISBLANK(SubmissionDates)) * NOT(ISBLANK(AcceptanceDates)) * (FLOOR(SubmissionDates, 1)<= DATE(YEAR(AcceptanceDates), MONTH(AcceptanceDates) + 3, DAY(AcceptanceDates))) * (FLOOR(SubmissionDates, 1)>=FLOOR(AcceptanceDates, 1)))

ucamhal commented 9 years ago

Another thing, the behaviour can be verified by removing the outer SUM() from either formula, selecting cell 2 of an empty row in the main sheet and entering the formula with Ctrl+Shift+Enter. You'll get an unpacked version with a TRUE or FALSE next to each row of the sheet, where TRUE indicates a row meeting the criteria.