Added getAttendance.sql to Gradebook repo

bella004 commented 7 years ago

The getAttendance function is now more modular only needing sectionID The getSectionID function returns the sectionID that getAttendance needs

Note: Make sure all imports are made before running

smurthys commented 7 years ago

A few observations;

.sql files are to be in the directory /src/db: see e-mail on this topic
Add product name after file name in the prologue comment: see ClassDB source files for example
Remove SET LOCAL SCHEMA everywhere; instead qualify all function and table names with schema name
On the comment ONLY RUN THE QUERY AFTER IMPORTING ROSTER AND OPENCLOSE CSVS:
- What does "query" reference?
- Why should the "query" be run only after importing data? It is ok for the function to return empty result if now data exists: it should not cause an error
getAttendance(SectionID INTEGER)
- Parameter name should be sectionID: start function/variable/parameter names with lower case
- The sub-query (SELECT ID FROM curSec) seems unnecessary: could be replaced with $1
- I have a feeling that only sdar.i is required in the GROUP BY clause
There is supposed to also be a 4-arg version of function getAttendance. See comment here.
Function getSectionID
- This function is unused
- Parameter names do not conform to naming convention (see above)
- Parameter name SeasonName is inconsistent with its use: call it just season and add a comment explaining the possible domains the parameter references
- This function is potentially useful in the 4-arg version of function getAttendance
Remove the stub content: this content is useful, but should be placed in a unit-test script

bella004 commented 7 years ago

This raises some questions:

".sql files are to be in the directory /src/db" There is no such directory on my branch. It's just /src
"Remove SET LOCAL SCHEMA everywhere; instead qualify all function and table names with schema name" So I got #34 backwards?
"There is supposed to be a 4-arg version of function getAttendance" So, change getSectionID to getAttendance and getAttendance to something else?

smurthys commented 7 years ago

Create a /src/db directory in this branch in your local repo and move the file to that directory. Then commit and push changes. There is already a /src/db directory in the master on the server.
34 does have merit, because the files cited there do have schema-related issues. I just commented on that issue.
There will indeed be a 4-arg function named getAttendance which returns attendance. (Postgres supports function overloading similar to C++ and Java.) However, it does not need to implement the entire functionality. Instead, it only needs to determine the section ID for the given parameters and then call the 1-arg version. The code to determine section ID is in function getSectionID in this PR. It is OK if that function is renamed getAttendance and then enhanced to return attendance.
- An alternative is to retain getSectionID and reuse it in the 4-arg version of getAttendance. However, as of now, getSectionID would be useful only in the 4-arg version of getAttendance. So, it is better to just rename/revise getSectionID to become the 4-arg version of getAttendance.

bella004 commented 7 years ago

I just ran the file and it produced an error: column "st.lname" must appear in the GROUP BY clause or be used in an aggregate function. This is because I'm using st.lname, st.fname, and st.mname in the SELECT, so would it be preferred if I remove those from the SELECT, or put them back in the GROUP BY and remove the sdar.i from the GROUP BY, because I know I'm not using that in the SELECT.

smurthys commented 7 years ago

It is possible I am not seeing something that is really obvious (thus my use of the phrase "I feel" originally). Let me know what I am missing.

Anyway, I wrote the following script to test what I understand the scenario is. (I had to create my own tables because I don't have a functioning Gradebook DB right now).

The SELECT query here groups by S.ID but outputs S.FName and S.LName. This is possible because the name columns are used in expression involving the aggregate function string_agg.

CREATE TEMP TABLE IF NOT EXISTS S (ID INTEGER PRIMARY KEY, FName VARCHAR, LName VARCHAR);
CREATE TEMP TABLE IF NOT EXISTS A (SID INTEGER REFERENCES S, Status VARCHAR);

TRUNCATE S CASCADE;
INSERT INTO S VALUES(1, 'Joe', 'Baker');
INSERT INTO S VALUES(2, 'Mary', 'Anne');

INSERT INTO A VALUES(1, 'A');
INSERT INTO A VALUES(1, 'P');
INSERT INTO A VALUES(1, 'X');
INSERT INTO A VALUES(2, 'P');
INSERT INTO A VALUES(2, 'E');
INSERT INTO A VALUES(2, 'P');

SELECT S.LName || ',' || S.FName || ',' || string_agg(Status, ',') AS csv
FROM S JOIN A ON S.ID=A.SID
GROUP BY S.ID;

This script produces the following result:

`csv`
`"Baker,Joe,A,P,X"`
`"Anne,Mary,P,E,P"`

smurthys commented 7 years ago

I am sure other reviewers will have thoughts as well on my comments and this PR in general.

bella004 commented 7 years ago

It should now work. Apparently, it was okay to use only st.id in the GROUP BY, but not sdar.i

afig commented 7 years ago

The changes made so far look good.

One small issue that was mentioned by @smurthys has not been addressed:

Add product name after file name in the prologue comment: see ClassDB source files for example

Solution: Add - Gradebook on line 1 after the file name. This makes it consistent with the prologue of other files. As a side note, some files have capitalized Gradebook as GradeBook. However, we are using the first capitalization (lowercase 'b').

Apart from that, I noticed a couple of potential minor changes that should be considered:

Line 81 uses csv_data and an alias for the first row (csv header). Line 84 uses csv an alias for a csv row of a student and their attendance codes. It is not necessary to use an alias here, since the one given during the function declaration, dataStr (line 57), will be used. However, as it currently stands, the first alias is misleading to someone reading the code. (Perhaps it should be csv_header?)
The getAttendance function declares and uses a variable named sectionID. However, this may not be necessary. Consider avoiding the declaration and use of a variable by just using the value of N.ID when calling createAttendancePivot. Doing so will allow the function to use sql instead of plpgsql, which is usually desirable in terms of efficiency and portability.
On line 71 , sub queries (i.e. SELECT x FROM y) are used to pick a value to call datesFromSchedule with. Is it necessary to do so? For instance, does using curSec.sd instead of SELECT sd FROM curSec cause an error? Avoiding the use of sub queries here will make it easier to interpret this line. If we do this, then we will also have to add curSec to the FROM clause.
Generally, we should try to be consistent in terms of situations where we use a capital letter vs a lowercase letter, even if they do not cause an issue. Doing so increases the ease of reading the code, and prevents issue when they do happen to become case sensitive. At the very least, we should be consistent with the existing naming convention in a particular file. A couple of situations where capitalization is not consistent in getAttendance.sql are with table names and alias names. It seems like most of the time, the first letter of a table name is capitalized, and that single letter aliases are lowercase, however there seem to be a few cases where that is not being followed.
Similar to the previous remark, there are situations where spacing before and after an equals sign is not consistent.

Thanks for your help with this task.

wildtayne commented 7 years ago

(Updated to take Andrew's comment into consideration) Hi Kyle, I did some testing on getAttendance.sql while updating the web server, and I have a few observations:

[L56]: createAttendancePivot should be called getAttendance, since it is the one-argument overload of the other getAttendance function.
[L71]: I think the call to datesFromSchedule should also be qualified with gradebook..
[L91]: The four argument overload of getAttendance should also return a one column table like the other getAttendance.
[L56/91]: The column name in the return table of both functions doesn't match the one we were using in development (csvwheadattnrec). We can discuss changing this if you want, but the server accesses this column by name, so it is important they match.
[L105]: RETURN QUERY must be placed above a select query returning a value in a plpgsql function. There's some more info on this in the plpgsql docs if you're interested. However, this is not necessary if you change the function to use SQL, like Andrew mentioned.

With these couple tweaks, these functions should be go to go as far as the server is concerned.

bella004 commented 7 years ago

"On line 71 , sub queries (i.e. SELECT x FROM y) are used to pick a value to call datesFromSchedule with. Is it necessary to do so? For instance, does using curSec.sd instead of SELECT sd FROM curSec cause an error? Avoiding the use of sub queries here will make it easier to interpret this line. If we do this, then we will also have to add curSec to the FROM clause."

I would do that if I didn't have to JOIN curSec to the FROM clause, since I would:

Need a JOIN condition to JOIN curSec on, which I don't think there is, or
Need to justify a second cross join, which I don't know if I can do properly

bella004 commented 7 years ago

Scratch that, I found a way to justify the second cross join if it's in a separate WITH clause.

smurthys commented 7 years ago

This script is shaping up really well, thanks to @bella004's persistence and the reviewers' diligence.

In addition to the earlier review comments, I propose the following changes:

Remove the file getAttendanceStub.sql, because it is not a complete test. (We have yet to discuss organization of unit tests.) The contents of that file are perhaps better saved in a gist.
Rename the result column csvwheadattnrec to AttendanceCsvWithHeader, because the current name is not in camel case and is not descriptive. (Unlike function/variable/parameter names, table and column names begin with an upper case letter.) This change will trigger an edit in the web server (which is yet to merged).

smurthys commented 7 years ago

On renaming the result column csvwheadattnrec to AttendanceCsvWithHeader: I don't know for sure, but it is possible this renaming conflicts with JSON naming (conventions) in the web server. However, almost any DB team is virtually guaranteed to disregard coding conventions of teams that use the DBMS. (They will of course reconsider names if they conflict with a protocol or other operational aspects.)

DASSL / Gradebook

Added getAttendance.sql to Gradebook repo #35

34 does have merit, because the files cited there do have schema-related issues. I just commented on that issue.