Closed nickdos closed 3 years ago
It looks like the log_source_type_id
is being lost and log_reason_type_id
is being read as 10
(testing) for all events - by the trigger.
mysql> select * from event_summary_breakdown_reason_entity_source where month = '202103' limit 100;
+--------+-------------------+--------------------+------------+--------------------+------------------+--------------+
| month | log_event_type_id | log_reason_type_id | entity_uid | log_source_type_id | number_of_events | record_count |
+--------+-------------------+--------------------+------------+--------------------+------------------+--------------+
| 202103 | 1002 | 10 | co10 | -1 | 5 | 8664 |
| 202103 | 1002 | 10 | co111 | -1 | 1 | 1 |
| 202103 | 1002 | 10 | co117 | -1 | 5 | 7811 |
| 202103 | 1002 | 10 | co118 | -1 | 1 | 1 |
and the last event shows:
mysql> select id,month,log_event_type_id,log_reason_type_id,log_source_type_id from log_event order by id desc limit 2;
+------------+--------+-------------------+--------------------+--------------------+
| id | month | log_event_type_id | log_reason_type_id | log_source_type_id |
+------------+--------+-------------------+--------------------+--------------------+
| 1047004135 | 202103 | 1002 | 12 | 1 |
| 1047004134 | 202103 | 1002 | 13 | 0 |
+------------+--------+-------------------+--------------------+--------------------+
So we should see log_reason_type_id
values for 12 and 13 and log_source_type_id
for 0 and 1.
Noting the trigger code defaults to -1
when a value can't be found.
Looking at the log_event_detail
table, it does not match either the sourceBreakdown
nor reasonBreakdown
output for thisMonth
(but reasonBreakdown
shows correct number of events but not records).
mysql> select * from event_summary_totals where month = '202103' limit 100;
+--------+-------------------+------------------+--------------+
| month | log_event_type_id | number_of_events | record_count |
+--------+-------------------+------------------+--------------+
| 202103 | 1002 | 15 | 203800 |
+--------+-------------------+------------------+--------------+
versus sourceBreakdown
:
{
"thisMonth": {
"events": 510,
"records": 397006
}
}
and reasonBreakdown
:
{
"thisMonth": {
"events": 15,
"records": 195499
}
}
So the latter is only out by 8,301
records
From what I can see the /ws/sourceBreakdown
has never worked without an entityUid
parameter.
This is because the only summary table that has source information is the event_summary_breakdown_reason_entity_source
and if an entityUid
parameter isn't included then log events aggregates are incorrectly multiplied by the number of entities involved in the log event.
I suspect this line of code at some stage was also checking for an entityUid
value
and the error message suggests that entityUid
was considered a mandatory parameter for the web service at some stage.
So I suggest we either:
entityUid
mandatoryor
event_summary_breakdown_reason
(without the entityUid).As its never worked (as far as I can see) i prefer the former.
Checking blame
, code has been there untouched, since Adam made a commit called Add source type breakdown reporting.
Requiring entityUid
would mean we couldn't look to see how downloads are being spread across the different source platforms, for all downloads (only for a given dataresource or collection), e.g. ALA4R
vs ALA
vs AVH
for all downloads, which is quite useful - I'd argue more useful for us than just knowing how those spreads happen for a single entityUid
(which is useful for the dataset custodians).
We're not currently showing the sourceBreakdown
stats on collectory pages anyway, so maybe we just go with the option 1 for now (done) and reassess if we want to later do option 2 or spend a abit more time on moving the breakdown functionality into ELK/ElasticSearch.
Requiring entityUid would mean we couldn't look to see how downloads are being spread across the different source platforms, for all downloads (only for a given dataresource or collection), e.g. ALA4R vs ALA vs AVH for all downloads, which is quite useful - I'd argue more useful for us than just knowing how those spreads happen for a single entityUid (which is useful for the dataset custodians).
I agree it could be useful. I think we would need an event_summary_breakdown_source
table (and possibly an event_summary_breakdown_source_entity
table) to support it and triggers to populate this, and changes to logger-service to query these new tables where an entityUid
hasn't been supplied (which is what happens with the reason breakdowns).
Deployed to test but now home page link is broken due to requiring a entityUid
param. So that needs fixing with an example entityUid
like CSIRO (in6) or Aust Museum (in4). Moving back to in progress.
Source breakdown (last month, 3 month, 1 year, all)
entryAlso, looking at code, there are more similar checks for entityUid
required:
Also, looking at code, there are more similar checks for
entityUid
required:* [ ] https://github.com/AtlasOfLivingAustralia/logger-service/blob/develop/grails-app/controllers/au/org/ala/logger/LoggerController.groovy#L233
If no entityUid
is supplied then:
https://github.com/AtlasOfLivingAustralia/logger-service/blob/develop/grails-app/services/au/org/ala/logger/LoggerService.groovy#L269
Yes, that code needs changing too. Dave says the stored proc can't support looking up without an entityUid
for ~reason~ source breakdowns.
Code reviewed ✅
Testing this with the following endpoints shows it is returning the expected number of download events and record counts...
https://logger-test.ala.org.au/service/sourceBreakdown?eventId=1002&entityUid=in4 vs https://logger-test.ala.org.au/service/reasonBreakdown?eventId=1002&entityUid=in4
E.g. https://logger-test.ala.org.au/service/sourceBreakdown?eventId=1002
shows:
doing a single download (1 event and 31 records) it changes to:
It seems the
event
count goes up by a factor of17
and therecords
count by a factor of4
.I suspect the
trigger
code that populates the summary tables is at fault...