bedatadriven / activityinfo-R

ActivityInfo R Language Client
https://www.activityinfo.org/support/docs/R/
18 stars 12 forks source link

Query database audit log #16

Closed akbertram closed 2 years ago

akbertram commented 3 years ago

It should be possible to query a date range of a database's audit log as a data.frame.

/cc @Ryo-N7

mjkallen commented 3 years ago

Note for implementation as the audit endpoint is not documented in the API docs.

(Public) API endpoint: /resources/databases/{databaseId}/audit which accepts a POST request with a JSON payload. Example payloads:

mjkallen commented 3 years ago

The new queryAuditLog() function is ready for testing from branch version-4.20. For example as follows:

To install the test version of the package:

library("remotes")
remotes::install_github("bedatadriven/activityinfo-R", ref = "version-4.20")

To query the audit log:

library("activityinfo")

database.id <- "abcde1234"

# find all events in which a record was deleted:
events <- queryAuditLog(database.id, typeFilter = "RECORD")

# by default, a maximum of 100 events is returned therefore we keep querying until there are no more events:
r <- events
while (isTRUE(attr(r, "moreEvents"))) {
  r <- queryAuditLog(database.id, before = attr(r, "endTime"), typeFilter = "RECORD")
  events <- rbind(events, r)
}

# filter deletion events:
events[events$deleted == TRUE,]
akbertram commented 3 years ago

It may be useful to incorporate the pagination into the R function. This would allow you to query for all events within a specific time range. The full database log can include tens of thousands of events, especially if there have been imports, so you may not want everything.

mjkallen commented 3 years ago

My assumption, based on empirical tests, is that the API endpoint returns a maximum of 100 events and that you can only provide a start time which I interpreted as being the most recent time. In other words, the endpoint will return a maximum of 100 most recent events up to that start time.

Do you suggest to add an optional after argument which can be used to pass a timestamp (earlier in time than the before timestamp) and to let the function repeatedly query the endpoint until all events between after and before have been collected?

akbertram commented 3 years ago

Yes, that's the idea.

mjkallen commented 3 years ago

I have updated the version-4.20 branch to implement the range functionality. You can now do something like:

events <- queryAuditLog("{databaseId}", before = as.Date("2021-10-20"), after = as.Date("2021-09-13"), typeFilter = "RECORD")

A few more details in the commit message: 5ead227e664d8b7ade79cacbd593e5637b84e4f8