AtlasOfLivingAustralia / logger-service

Atlas event logging
https://logger.ala.org.au
1 stars 8 forks source link

sourceBreakdown service shows values that are too high #31

Closed nickdos closed 3 years ago

nickdos commented 3 years ago

E.g. https://logger-test.ala.org.au/service/sourceBreakdown?eventId=1002

shows:

{
  "thisMonth": {
    "events": 493,
    "records": 396882
  }
}

doing a single download (1 event and 31 records) it changes to:

{
  "thisMonth": {
    "events": 510,
    "records": 397006
  }
}

It seems the event count goes up by a factor of 17 and the records count by a factor of 4.

I suspect the trigger code that populates the summary tables is at fault...

nickdos commented 3 years ago

It looks like the log_source_type_id is being lost and log_reason_type_id is being read as 10 (testing) for all events - by the trigger.

mysql> select * from event_summary_breakdown_reason_entity_source where month = '202103' limit 100;
+--------+-------------------+--------------------+------------+--------------------+------------------+--------------+
| month  | log_event_type_id | log_reason_type_id | entity_uid | log_source_type_id | number_of_events | record_count |
+--------+-------------------+--------------------+------------+--------------------+------------------+--------------+
| 202103 |              1002 |                 10 | co10       |                 -1 |                5 |         8664 |
| 202103 |              1002 |                 10 | co111      |                 -1 |                1 |            1 |
| 202103 |              1002 |                 10 | co117      |                 -1 |                5 |         7811 |
| 202103 |              1002 |                 10 | co118      |                 -1 |                1 |            1 |

and the last event shows:

mysql> select id,month,log_event_type_id,log_reason_type_id,log_source_type_id from log_event order by id desc limit 2;
+------------+--------+-------------------+--------------------+--------------------+
| id         | month  | log_event_type_id | log_reason_type_id | log_source_type_id |
+------------+--------+-------------------+--------------------+--------------------+
| 1047004135 | 202103 |              1002 |                 12 |                  1 |
| 1047004134 | 202103 |              1002 |                 13 |                  0 |
+------------+--------+-------------------+--------------------+--------------------+

So we should see log_reason_type_id values for 12 and 13 and log_source_type_id for 0 and 1.

Noting the trigger code defaults to -1 when a value can't be found.

nickdos commented 3 years ago

Looking at the log_event_detail table, it does not match either the sourceBreakdown nor reasonBreakdown output for thisMonth (but reasonBreakdown shows correct number of events but not records).

mysql> select * from event_summary_totals where month = '202103' limit 100;
+--------+-------------------+------------------+--------------+
| month  | log_event_type_id | number_of_events | record_count |
+--------+-------------------+------------------+--------------+
| 202103 |              1002 |               15 |       203800 |
+--------+-------------------+------------------+--------------+

versus sourceBreakdown:

{
  "thisMonth": {
    "events": 510,
    "records": 397006
  }
}

and reasonBreakdown:

{
  "thisMonth": {
    "events": 15,
    "records": 195499
  }
}

So the latter is only out by 8,301 records

djtfmartin commented 3 years ago

From what I can see the /ws/sourceBreakdown has never worked without an entityUid parameter.

This is because the only summary table that has source information is the event_summary_breakdown_reason_entity_source and if an entityUid parameter isn't included then log events aggregates are incorrectly multiplied by the number of entities involved in the log event.

I suspect this line of code at some stage was also checking for an entityUid value and the error message suggests that entityUid was considered a mandatory parameter for the web service at some stage.

So I suggest we either:

or

As its never worked (as far as I can see) i prefer the former.

nickdos commented 3 years ago

Checking blame, code has been there untouched, since Adam made a commit called Add source type breakdown reporting.

Requiring entityUid would mean we couldn't look to see how downloads are being spread across the different source platforms, for all downloads (only for a given dataresource or collection), e.g. ALA4R vs ALA vs AVH for all downloads, which is quite useful - I'd argue more useful for us than just knowing how those spreads happen for a single entityUid (which is useful for the dataset custodians).

We're not currently showing the sourceBreakdown stats on collectory pages anyway, so maybe we just go with the option 1 for now (done) and reassess if we want to later do option 2 or spend a abit more time on moving the breakdown functionality into ELK/ElasticSearch.

djtfmartin commented 3 years ago

Requiring entityUid would mean we couldn't look to see how downloads are being spread across the different source platforms, for all downloads (only for a given dataresource or collection), e.g. ALA4R vs ALA vs AVH for all downloads, which is quite useful - I'd argue more useful for us than just knowing how those spreads happen for a single entityUid (which is useful for the dataset custodians).

I agree it could be useful. I think we would need an event_summary_breakdown_source table (and possibly an event_summary_breakdown_source_entity table) to support it and triggers to populate this, and changes to logger-service to query these new tables where an entityUid hasn't been supplied (which is what happens with the reason breakdowns).

nickdos commented 3 years ago

Deployed to test but now home page link is broken due to requiring a entityUid param. So that needs fixing with an example entityUid like CSIRO (in6) or Aust Museum (in4). Moving back to in progress.

Also, looking at code, there are more similar checks for entityUid required:

Rita-C commented 3 years ago

Also, looking at code, there are more similar checks for entityUid required:

* [ ]  https://github.com/AtlasOfLivingAustralia/logger-service/blob/develop/grails-app/controllers/au/org/ala/logger/LoggerController.groovy#L233

If no entityUid is supplied then: https://github.com/AtlasOfLivingAustralia/logger-service/blob/develop/grails-app/services/au/org/ala/logger/LoggerService.groovy#L269

nickdos commented 3 years ago

Yes, that code needs changing too. Dave says the stored proc can't support looking up without an entityUid for ~reason~ source breakdowns.

Rita-C commented 3 years ago

https://upsource.ala.org.au/logger-service/review/LS-CR-2

nickdos commented 3 years ago

Code reviewed ✅

nickdos commented 3 years ago

Testing this with the following endpoints shows it is returning the expected number of download events and record counts...

https://logger-test.ala.org.au/service/sourceBreakdown?eventId=1002&entityUid=in4 vs https://logger-test.ala.org.au/service/reasonBreakdown?eventId=1002&entityUid=in4