data-lessons / library-sql-DEPRECATED

SQLite lesson for librarians NOW MOVED > https://github.com/LibraryCarpentry/lc-sql
https://github.com/LibraryCarpentry/lc-sql
Other
2 stars 16 forks source link

Rephrasing needed for second exercise on aggregation #60

Closed danmichaelo closed 6 years ago

danmichaelo commented 7 years ago

Had a question today about this exercise: "How many citations that were counted each month a) in total; b) per journal"

"How many citations that were counted each month" can be interpreted as "How many citations were made each month". We don't have any citation time data in the dataset, only article publication time, so that question would be impossible to answer with the data we have, but I still think we should try to rephrase it to make it more clear.

I struggle a little bit to come up with a good way of phrasing it though. Could be because the question is a bit artificial. Perhaps something like "the number of citations per (publication) month; a) ..."? Not sure if that is easy to understand. Help needed :)

icecjan commented 6 years ago

I don't think the question is artificial but agree it needs rephrasing. The problem may be that we tried to combine two things into 1 a) and b)? What do you think about the below?

Write queries that return:

1. The number of citations counted in each month.
2. The number of citations counted of each journal in each month.
3. Average number of citations of each journal in each month.
4. Can you modify the above queries combining them into one?
danmichaelo commented 6 years ago

Hm, I still find "number of citations counted in each month" ambiguous. I mean, we're not counting the number of citations per month, but the number of citations per publication month of the cited article. Agree?

Is it too cumbersome to write something like that? Or do you think it's not necessary since there is only one way to count using the data available?

icecjan commented 6 years ago

Ok, I can't see how else we can write this challenge though. Any thoughts @elainewong ?

icecjan commented 6 years ago

Alright, I have another thought. Instead of using month, we could use LicenceId?

Before introducing the challenge, we use "how many articles were published in each journal?" to demonstrate GROUP BY. If we build on that, the challenge could be:

Write queries that return:
1. How many articles were published under each licence a.) in total, b). per journal.
2. Average citation counts of each journal under each licence.
3. Can you modify the above queries combining them into one?

This is still not ideal, but should eliminate the question about counting citations by publication month?

danmichaelo commented 6 years ago

Good idea! These are also more interesting questions – looks very close to ideal to me 👍