Closed FredericBlum closed 1 year ago
ok, I have no idea how I just merged this, I just checked out "main" from the CLI and going on github suddenly the PR is merged? Feels very weird. Should we revert this?
I can have a look tomorrow
Frederic Blum @.***> schrieb am So., 23. Juli 2023, 14:05:
ok, I have no idea how I just merged this, I just checked out "main" from the CLI and going on github suddenly the PR is merged? Feels very weird. Should we revert this?
— Reply to this email directly, view it on GitHub https://github.com/cldf-datasets/doreco/pull/21#issuecomment-1646823128, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKEQ4IAZRMK7BPNCJMLXRUHRZANCNFSM6AAAAAA2ULICWQ . You are receiving this because you were mentioned.Message ID: @.***>
@Tarotis re stdev
as window function: I think the difference between aggregate functions and window functions is well explained here https://www.sqlitetutorial.net/sqlite-window-functions/ .
Basically, an aggregate function computes one value for a set of input rows, whereas a window functions computes a value for each row. stdev
is an aggregate function, thus can't be used without a corresponding GROUP BY
clause that reduces groups rows (the input for the aggregate function), but selects only one row per group.
Coming from SQL and looking at R, I was equally confused by this distinction being hidden via the "we are always working with one dataframe" paradigm :)
Re
Is there a way to combine those groupings in a single view?
What we do instead is pulling the two views into the main query appropriately (see https://github.com/cldf-datasets/doreco/blob/62c48ed13419ff30af1481dbe04e0362da860315/init_query.sql). Conceptually, I see these two views as lookup tables. For each row (i.e. phone) in the main query, we get the associated word and language, and then look up the wordstats
and langstats
.
@xrotwang I had now time to look at the updated query you provided. Thank you again for all this work. Trying to add the standardization and the word count, I am running into two problems:
Conceptually: Do we need two views for this? To get at the frequency, I need to divide the number of occurrences of a word (w.cldf_name) by the total words in the language (w.cldf_id). However, the first one I only get by grouping my data on language and w.cldf_name, and the second by only grouping on the language. Is there a way to combine those groupings in a single view?
Computationally: The way I set up things right now (two views, dividing their results) kills my laptop, and I do not understand why. Could you take a look at it and tell me what I am doing wrong, so that I can fix it?
All this is not urgent, as you can see by the fact that I as well did not have time to look at this previously. I very much appreciate all the learning I am getting about SQL by this though.