Automatic statistics - Githubissues

tymees commented 3 years ago

Couldn't find my old scripts, but I'll dump what I used today here for future reference:


>>> from reviews.models import Review
>>> all_reviews = Review.objects.filter(date_start__year=2020)
>>> all_reviews = all_reviews.filter(proposal__reviewing_committee=1) # 1 = LK, 3 = AK
>>> all_reviews = all_reviews.exclude(stage=0) # Exclude supervisor reviews
>>> from collections import Counter
>>> Counter([ x.continuation for x in all_reviews]) # Use the review model to find out what these numbers mean
Counter({1: 86, 0: 81, 5: 5, 6: 1})
>>> sum([1 if x.short_route else 0 for x in all_reviews]) # is short route
132
>>> sum([1 if x.short_route == False else 0 for x in all_reviews]) # is long route
32
>>> sum([1 if x.short_route == None else 0 for x in all_reviews]) # straight to revision
9
>>> from proposals.models import Proposal
>>> Proposal.objects.filter(date_submitted__year=2020, reviewing_committee=1, is
_revision=True, parent__status_review=True).count()
7
>>> x = Counter([ (x.date_end - x.date_start).days if x.date_end else -1 for x i
n all_reviews])
>>> xx = list(x.elements())
>>> x
Counter({-1: 24, 0: 21, 1: 18, 6: 12, 3: 10, 5: 10, 2: 9, 4: 9, 7: 9, 10: 7, 9: 
5, 8: 4, 15: 4, 20: 4, 17: 3, 18: 3, 21: 3, 11: 2, 14: 2, 16: 2, 22: 2, 24: 2, 3
1: 2, 12: 1, 13: 1, 23: 1, 27: 1, 30: 1, 38: 1})
>>> for i in range(24):  # Derp method to remove non-ended reviews. Number read from looking at `x`
...     xx.remove(-1)
... 
>>> x = Counter(xx) # REALLY a derp method, isn't it? :P
>>> for d, n in x.items():
...     print("{} dagen: {}".format(d, n))
... 
[snip]
>>> sum(x.elements())/len(list(x.elements())) # calculate average
7.523489932885906

tymees commented 3 years ago

Small update:

Desiree and I decided it might be better to just export an CSV (maybe excel, if possible?) with all attributes relevant for these statistiscs.

This way, the secraty can see the individual studies that are counted in the statistics instead of it being a black box.

tymees commented 3 years ago

Existing CSV code can be found under proposals/management/command/export_csv.py

tymees commented 2 years ago

2022 requests from FETC

Per chamber:

Number of submitted studies (excluding revisions)
Number of those assigned to the short route
Number of those assigned to the long route
Number of those submitted by bachelor, master (and RMA) students (total and splitted out)
(If possible) turnaround time, total and per route (short/long) (for the first decision! revision reviews should be excluded)

2022 requests from UiL OTS Labs management:

For linguistic chamber:

Registration kinds (EEG, ET, reaction time, questionaire, etc)

tymees commented 2 years ago

Basic statistics stuff is now implemented as a management command. The code behind it is modular tho, so if/when we want to make this a page we can build on that

djhcapel commented 2 years ago

The stats we would need for our annual report are (as discussed with chairs both chambers yesterday, and see above comment):

Per chamber:

-Number of submitted studies (excluding revisions)
-Number of revisions (we hope to reduce this number in the future (privacy check, DMPs))
-Number of those assigned to the short route
-Number of those assigned to the long route
-Number of those submitted by bachelor, master (and RMA) students (total and splitted out) [see issue #380]
-(If possible) Turnaround time, total and per route (short/long) (for the first decision! revision reviews should be excluded [dc1: turnaround time 'submission date' until 'first decision date' is meant; applications that are immediately returned for 'revision' by choosing 'straight to revision' should be ignored]

[dc2: I would like to be able to calculate e.g. the average, range, and median myself: could I get a csv file with applications per chamber and per route, or at least that chamber and route are included in the table columns? I would also like to see whether it concerns research by students.] -Turnaround time 'submission date' until 'final decision'; applications for which the procedure was discontinued should be excluded. As above, I would like to be able to calculate median and average etc. myself.

For linguistic chamber:

-Registration kinds (EEG, ET, reaction time, questionaire, etc)

miggol commented 1 year ago

Additional solution: Allow the secretary to access the export_csv command via a simple page on which they choose the year.

miggol commented 3 months ago

Statistics for the 2023 report were gathered with the new management command in the feature/get_stats branch. This command, which returns a CSV, is the preferred method of getting statistics going forward because it allows for manual detection of outliers.

Some additions to this CSV have been requested for the future:

aanvraagnummer (dus het middelste deel van het referentienummer): op deze manier kan ik op aanvraagnummer sorteren om de hoogste versie te kunnen krijgen, zodat ik het gemiddeld aantal revisies kan berekenen.

versienummer (dus het laatste deel van het referentienummer): om te kunnen sorteren op ‘eerste versie’, zodat ik de tijd tot eerste beslissing kon berekenen, zie tweede tabblad, maar ook om het gemiddelde aantal revisies te kunnen berekenen.

datum ingediend (= ‘ingediend op’, maar dan zonder de tijd): om de tijd tot eerste beslissing te kunnen berekenen.

maand (= gehaald uit ‘ingediend op’): om een grafiek van aantallen aanvragen/revisies per maand te kunnen maken.

datum eerste beslissing (= Besloten op’, maar dan zonder de tijd): om de tijd tot eerste beslissing te kunnen berekenen.

first decision (= ‘datum eerste beslissing’ min ‘datum ingediend’): als dit meegeleverd kan worden, fijn, maar ik kan het dus ook uitrekenen.

route soort: ik heb nu als smaken ‘korte route’, ‘lange route’ en ‘amendement’, maar eigenlijk wil ik daar ook ‘elders goedgekeurd’ en ‘voortoetsing’ bij hebben, kan dat?

See the secretary email dated July 7th 2024 for reference and a supporting excel sheet.

djhcapel commented 1 month ago

Isn't it the mail of 5 July 2024?

DH-IT-Portal-Development / ethics

Automatic statistics #118