DASSL / Gradebook

Open-source product to provide a practical means for instructors to record student attendance and assessment
Other
8 stars 4 forks source link

Updated OpenClose Import Procedure #53

Closed wildtayne closed 7 years ago

wildtayne commented 7 years ago

This PR provides an updated OpenClose import procedure, modeled after the updated Roster import procedure. It also adds a few other changes that were necessary to get the procedure to work correctly. This should also fix #34 and mostly address #37.

General

importOpenCloseCSV.bat

prepareOpenCloseImport.sql

Overall, prepareOpenCloseImport.sql is still pretty cumbersome, but I think it is somewhat more streamlined now. The user procedure is definitely improved using the model from the Roster import procedure.

I was able to setup a new database and import all the data using the three new procedures, and I can confirm they all work and are much more user friendly now.

smurthys commented 7 years ago

The changes in this PR indeed make vast improvements to the OpenClose import procedure.

I have the following quick observations to further improve the overall procedure.

README:

CreateTables.sql:

prepareOpenCloseImport.sql:

I believe the suggested changes in prepareOpenCloseImport.sql simplifies code and makes it easier to both review and maintain.

smurthys commented 7 years ago

Please see Issue #13 as well. I feel this PR might have fixed that issue.

wildtayne commented 7 years ago

Thanks for the feedback. I've pushed some commits to address some of these comments. I added an ON CONFLICT DO UPDATE to the Gradebook.Course insert statements, so this PR should fix all the issues outlined in #13 now.

I've thought/researched some more on the instructor name issue. One additional solution I've thought of is to create a unique index on FName and LName where MName is null, for example:

CREATE UNIQUE INDEX idx_Unique_Names_NULL
ON Gradebook.Instructor(FName, LName)
WHERE MName IS NULL;

While also keeping the original UNIQUE (FName, MName, LName) in the table definition. This better captures the situation - we want (FName, LName) to be unique only where MName is null.

I have also renamed the directory, scripts, and related DB objects to CourseSchedule instead of OpenClose.

smurthys commented 7 years ago

@srrollo The import script is now very streamlined. I particularly like how the renaming turned out.

I am OK with your suggested index to address MName being NULL. We can revisit this issue after M1.

I only found a few minor things to consider:

README.md:

importCourseScheduleCSV.bat:

prepareCourseScheduleImport.sql:

wildtayne commented 7 years ago

I've pushed a few more commits that address these issues.

On quoting %1 - I played around with this a little, and I get an error if I use a path with spaces regardless of how %1 is quoted in the batch file. For example, the command: ./importCourseScheduleCSV.bat "2017Summer OpenClose.csv" 2017 Summer postgres gradebook 192.168.10.24 yields the error: OpenClose.csv""=="" was unexpected at this time.

smurthys commented 7 years ago

It looks like the batch command retains any double quotes passed to it originally. If that is true, the batch command used in the import procedure should be fine as is.

I just created a batch command quotes.bat with the following content:

@echo "%1"
@echo %1

Running quotes.bat "hello there" produced the following output:

""hello there""
"hello there"

Running quotes.bat hello produced the following output:

"hello"
hello
afig commented 7 years ago

Apologies for the delay in giving input.

I agree with @smurthys's comment about the script being streamlined. The script is now much better organized and is easier to maintain. The current behavior of quotes seems to be fine. I also agree that the given solution for the unique instructor names issue is acceptable. Again, as mentioned, we should revisit this at some point. Searching for a final solution is sure to bring up some interesting points.

I was able to test the import procedure with the Spring 2017 test data. As far as major issues, I only noticed one, where the README states that for importCourseScheduleCSV.bat, the season argument can be:

The "order", name, or code of the season in which the schedule depicts, for example, 'Spring'. (See comments associated with the function Gradebook.getSeason for details.)

However, giving the "order" of a season results in an error. This seems to be the result of the pg_temp.importCourseSchedule only checking for a match on name or code on lines 115 and 116 of prepareCouseScheduleImport.sql. This query should probably be replaced with a call to Gradebook.getSeasonOrder. -- On second thought, it seems as through the seasonOrder variable can be used here.

Apart from that, there were some small typos and inconsistencies in comments. Many of these have been carried over and existed for a long time, so I gave all comments in prepareCourseScheduleImport.sql a look-over, even those that were not touched by this PR:

importCouseScheduleCSV.bat:

I will attempt to import all test CSVs, and test the term sequence facility. I will report back if any other issues arise while performing those tests.


Here are some other observations on things to address after M1:

wildtayne commented 7 years ago

Thanks for taking the time to look at this so thoroughly, there's a lot in prepareCourseScheduleImport.sql. The issue with using season orders is a good catch, I forgot to replaces that old query with the new seasonOrder variable in a couple places.

I've addressed all the issues in the comments you found, and replaces some others that were outdated.

I also agree that the current quote behavior in the batch file is OK. I think the issue with spaces I was having is related to how powershell executes batch files.

smurthys commented 7 years ago

Indeed there is a lot going on in prepareCourseScheduleImport.sql. Something like that script is hard to develop from scratch, but retrofitting something that already exists is harder. So, kudos @srrollo for your perseverance and @afig for your review effort.

I am just beginning to test the many pending PRs. I tried importing course schedule and ran into trouble when using a CSV filename with embedded space. For example, I ran the following command in a command window:

importCourseScheduleCSV.bat "2004 FallOpenClose.csv" 2004 fall postgres gbnew

I got the following error:

FallOpenClose.csv""=="" was unexpected at this time.

I believe the error is on Line 16 of importCourseScheduleCSV.bat.

If this issue is not fixed, the README (even in roster import) needs to clearly say spaces are not permitted in CSV file path. It is a rather artificial limitation, but it may be acceptable for M1.

I will continue to test and report if I find any other issue.

wildtayne commented 7 years ago

I just pushed a commit that should allow filenames with spaces. I needed to make two changes for this:

These changes should probably also be applied to the roster import batch file. I can add them to this PR, if that is OK.

smurthys commented 7 years ago

Excellent. I just verified that the procedure works even when CSV file path has an embedded space.

Yes, please make the change to the roster-import procedure as well. Thank you.

wildtayne commented 7 years ago

Sounds good. I just pushed a commit with the space fix in importRosterCSV.bat.

smurthys commented 7 years ago

I just imported 54 OpenClose CSV files and they all imported without errors. The database now has 54 terms, 4560 instructors, and 43557 sections. I did notice the procedure gets progressively slower, but we can analyze that after M1.

So, all in all, excellently done.

BTW, I have attached to this comment a batch file (with a .txt extension because GitHub does not permit adding .bat files) I created to import all the sample CSV files. I am leaving it here in case we want to add such a batch file to the test directory at a future time. It will of course need a few changes before adding it to the repo.

importAllTestSchedules.txt