enhancing count_records_by_date

Avisblatt / avisblatt

Avisblatt

4 stars 1 forks source link

enhancing count_records_by_date #83

Closed annareimann closed 1 year ago

annareimann commented 1 year ago

I use this function quite often - as do all of us, I think - and I have a two suggestions for enhancement (maybe you have some further ones?):

This should be easy: add column name as an argument instead of "N", so instead of:

ex <- count_records_by_date(ex_ids, c_all, "year")
names(ex)[2] <- "example"

one could do this:

ex <- count_records_by_date(ex_ids, c_all, "year", "example")

This would be a prerequisite for my second suggestion:

Rewrite it so that multiple sets of IDs can be counted at once inside the function itself, this would look something like this, then:

ex <- count_records_by_date(c(ex_ids1, ex_ids2), c_all, "year", c("example1", "example2))

These are enhancements in the most literal sense - it works as is, but I think they would make the function more use- and powerful regarding publication of the avisblatt-package.

Thoughts, @aengel17 @wissen-ist-acht @LarsDIK ?

annareimann commented 1 year ago

Another thought: instead of just "days", "weeks" and "years" we could maybe add "4years", "8years" and "29years" - or something prettier? Or maybe add a range argument with default = 1, so something like:

ex <- count_records_by_date(ex_ids, c_all, "year", "example", 4)

would result in four year periods... Just a thought, because I almost always aggregate the number in a second step

annareimann commented 1 year ago

And, an easy one: set empty values to 0 instead of NA

aengel17 commented 1 year ago

I like the ideas. Labelling and multiple sets should be easy to implement, so is 0 instead of NA.

Regarding flexible time periods -- can be done of course, but we would need to discuss how this should work exactly, and depending on that might change how the function operates at the core. An "aggregate over x periods" parameter (what you called range argument) might be the easiest solution.

While we are at it, we could also add a parameter to specifiy if the output should be the absolute number of ads (default), or per 1000 inhabitants, or some index number or share - as already implemented in the more cumbersome calculate_frequency_data()/plot_frequency_data(), which I would prefer to not include in the final package...

aengel17 commented 1 year ago

I created a branch "enhancing_count_records" with a new count_records_by_date function, and updates the documentation in How_to_filter, rows 404-468

The examples given there work (for me), but please try yourself.

I essentially implemented all we discussed :)