Closed wildtayne closed 7 years ago
The changes in this PR indeed make vast improvements to the OpenClose import procedure.
I have the following quick observations to further improve the overall procedure.
README
:
importCourseSchedule
: OpenClose is just one possible external systemEllucian
is misspelled: has one l
instead of two (this error exists elsewhere as well and I am guilty of introducing this error)roster
still linger. I propose using the term course schedule
or just schedule
depending on the contextCreateTables.sql
:
FName
and LName
as a solution to the problem identified. I need some time to think about this situation, and in the meanwhile, welcome others' thoughts.prepareOpenCloseImport.sql
:
Season
in function importOpenClose
to seasonIdentification
and use it to first obtain season order for use in the rest of the functionI believe the suggested changes in prepareOpenCloseImport.sql
simplifies code and makes it easier to both review and maintain.
Please see Issue #13 as well. I feel this PR might have fixed that issue.
Thanks for the feedback. I've pushed some commits to address some of these comments. I added an ON CONFLICT DO UPDATE
to the Gradebook.Course
insert statements, so this PR should fix all the issues outlined in #13 now.
I've thought/researched some more on the instructor name issue. One additional solution I've thought of is to create a unique index on FName and LName where MName is null, for example:
CREATE UNIQUE INDEX idx_Unique_Names_NULL
ON Gradebook.Instructor(FName, LName)
WHERE MName IS NULL;
While also keeping the original UNIQUE (FName, MName, LName)
in the table definition. This better captures the situation - we want (FName, LName)
to be unique only where MName is null.
I have also renamed the directory, scripts, and related DB objects to CourseSchedule instead of OpenClose.
@srrollo The import script is now very streamlined. I particularly like how the renaming turned out.
I am OK with your suggested index to address MName
being NULL
. We can revisit this issue after M1.
I only found a few minor things to consider:
README.md
:
importCourseScheduleCSV.bat
:
%1
needs to be in double quotes to accommodate a file path with embedded spaces (this is probably an issue even in roster import)prepareCourseScheduleImport.sql
:
Course Schedule data
with course schedules
It
) begins with an upper case letterThis is accomplished by testing if the expression <expression1> is the same as the expression <expression2>
checkTermSequence
is voodoo :). Nothing to fix though.seasonIdentification
should be VARCHAR(20)
which is the size of season namegetSeasonOrder(seasonIdentification)
importCourseSchedule
is a bit too long (130 lines). No need to change anything nowI've pushed a few more commits that address these issues.
On quoting %1
- I played around with this a little, and I get an error if I use a path with spaces regardless of how %1
is quoted in the batch file. For example, the command:
./importCourseScheduleCSV.bat "2017Summer OpenClose.csv" 2017 Summer postgres gradebook 192.168.10.24
yields the error:
OpenClose.csv""=="" was unexpected at this time.
It looks like the batch command retains any double quotes passed to it originally. If that is true, the batch command used in the import procedure should be fine as is.
I just created a batch command quotes.bat
with the following content:
@echo "%1"
@echo %1
Running quotes.bat "hello there"
produced the following output:
""hello there""
"hello there"
Running quotes.bat hello
produced the following output:
"hello"
hello
Apologies for the delay in giving input.
I agree with @smurthys's comment about the script being streamlined. The script is now much better organized and is easier to maintain. The current behavior of quotes seems to be fine. I also agree that the given solution for the unique instructor names issue is acceptable. Again, as mentioned, we should revisit this at some point. Searching for a final solution is sure to bring up some interesting points.
I was able to test the import procedure with the Spring 2017 test data. As far as major issues, I only noticed one, where the README states that for importCourseScheduleCSV.bat
, the season argument can be:
The "order", name, or code of the season in which the schedule depicts, for example, 'Spring'. (See comments associated with the function
Gradebook.getSeason
for details.)
However, giving the "order" of a season results in an error. This seems to be the result of the pg_temp.importCourseSchedule
only checking for a match on name
or code
on lines 115 and 116 of prepareCouseScheduleImport.sql
. This query should probably be replaced with a call to -- On second thought, it seems as through the Gradebook.getSeasonOrder
.seasonOrder
variable can be used here.
Apart from that, there were some small typos and inconsistencies in comments. Many of these have been carried over and existed for a long time, so I gave all comments in prepareCourseScheduleImport.sql
a look-over, even those that were not touched by this PR:
L 17: Given that there are other scripts that need to be run prior to running the script, it might be best to simply state that prepareDB.psql
is required (or that all scripts that are run by prepareDB.psql
are required).
L 19: "CSV" is capitalized here, but "csv" is used elsewhere in the script. importCourseScheduleCSV.bat
uses "CSV", except when using the COPY
meta-command. Given that "CSV" is used in file names, I think we should go for "CSV" in comments, but "csv" with the COPY
command (for consistency with other import scripts).
L 46: There should probably be a period after the closing bracket.
L 47: Since "Essentially, each...
" starts a new paragraph, it should have its leading space removed.
L 77: Should be "semester".
L 78: The phrase that makes up this line is difficult to understand.
L 94: Should be "chronologically".
L 101: Should be "overridden" (in NOTICE).
L 110-111: Seems like "Select the most extreme start and end dates from TermDates" would be better (and could go on one line).
L 125-128: Column names here should start with an uppercase letter for consistency.
L 134: There is an extra space between "into" and "Gradebook.Section".
L 138: Missing period at the end of the sentence.
L 152: Not sure what "clearly" represents here.
L 175: Left-over comment.
L 186: Column names should probably start with a capital letter. "id" is also un-capitalizes in surrounding lines.
L 196-198: Comment could be on one line (or might be misplaced). Also, should "is" be "its"?
importCouseScheduleCSV.bat
:
I will attempt to import all test CSVs, and test the term sequence facility. I will report back if any other issues arise while performing those tests.
Here are some other observations on things to address after M1:
The current behavior for instructor names with > 3 parts is unconventional. (This might also be an issue with the current schema.)
At some point, it would be good idea to update the README(s) to explain the fields that the staging table expects, which will allow the procedure to be easily adapted to other course schedule systems.
Thanks for taking the time to look at this so thoroughly, there's a lot in prepareCourseScheduleImport.sql
.
The issue with using season orders is a good catch, I forgot to replaces that old query with the new seasonOrder
variable in a couple places.
I've addressed all the issues in the comments you found, and replaces some others that were outdated.
I also agree that the current quote behavior in the batch file is OK. I think the issue with spaces I was having is related to how powershell executes batch files.
Indeed there is a lot going on in prepareCourseScheduleImport.sql
. Something like that script is hard to develop from scratch, but retrofitting something that already exists is harder. So, kudos @srrollo for your perseverance and @afig for your review effort.
I am just beginning to test the many pending PRs. I tried importing course schedule and ran into trouble when using a CSV filename with embedded space. For example, I ran the following command in a command window:
importCourseScheduleCSV.bat "2004 FallOpenClose.csv" 2004 fall postgres gbnew
I got the following error:
FallOpenClose.csv""=="" was unexpected at this time.
I believe the error is on Line 16 of importCourseScheduleCSV.bat
.
If this issue is not fixed, the README (even in roster import) needs to clearly say spaces are not permitted in CSV file path. It is a rather artificial limitation, but it may be acceptable for M1.
I will continue to test and report if I find any other issue.
I just pushed a commit that should allow filenames with spaces. I needed to make two changes for this:
IF "%~1"=="" GOTO usage
. %~1
strips quotes from %1
, and then we put new quotes around %1
. This causes the equality check to work, and fixes the FallOpenClose.csv""=="" was unexpected at this time.
error.\COPY
meta command has been changed to "\COPY CourseScheduleStaging FROM '%~1' WITH csv HEADER"
. This causes psql to correctly parse filenames with spaces, instead of treating the space as a separator between parameters.These changes should probably also be applied to the roster import batch file. I can add them to this PR, if that is OK.
Excellent. I just verified that the procedure works even when CSV file path has an embedded space.
Yes, please make the change to the roster-import procedure as well. Thank you.
Sounds good. I just pushed a commit with the space fix in importRosterCSV.bat
.
I just imported 54 OpenClose CSV files and they all imported without errors. The database now has 54 terms, 4560 instructors, and 43557 sections. I did notice the procedure gets progressively slower, but we can analyze that after M1.
So, all in all, excellently done.
BTW, I have attached to this comment a batch file (with a .txt extension because GitHub does not permit adding .bat files) I created to import all the sample CSV files. I am leaving it here in case we want to add such a batch file to the test directory at a future time. It will of course need a few changes before adding it to the repo.
This PR provides an updated OpenClose import procedure, modeled after the updated Roster import procedure. It also adds a few other changes that were necessary to get the procedure to work correctly. This should also fix #34 and mostly address #37.
General
/src/db/importOpenClose
folderimportOpenCloseCSV.bat
prepareOpenCloseImport.sql
prepareOpenCloseImport.sql
pg_temp
InstructorUnnest()
has been removed and merged into a CTE (#37).checkTermSequence()
is stored in a variable, and one call is removedON CONFLICT DO NOTHING
added to the InstructorINSERT
statementRETURNING
clause of the instructorINSERT
statement is nowUNION
ed to the current instructor table. This fixes an issue where sections would not be inserted if they contained an instructor that already existed in the database.UNIQUE
constraint has been added to theInstructor
table. TheUNIQUE(FName, MName, LName)
constraint will never detect two names as the same if MName is NULL. A `UNIQUE(FName, LName) was added to address this.Overall,
prepareOpenCloseImport.sql
is still pretty cumbersome, but I think it is somewhat more streamlined now. The user procedure is definitely improved using the model from the Roster import procedure.I was able to setup a new database and import all the data using the three new procedures, and I can confirm they all work and are much more user friendly now.