Open BonnieJRobert opened 1 month ago
Perhaps it's the documentation you are working on for the team, but it's unclear exactly which load-...
files need to be run before starting the 02b-x series. The checks on data existing are good, but with the tenuous system of each person relying on their own IDIR schema for the data, which sometimes gets wiped out, it can be difficult to recall what exactly needs to be re-loaded.
To avoid issues with idirs being wiped, once the data has been loaded to SQL, it should really be moved to dbo, and then we should directly query off that table instead of having copies floating in our own IDIRs.
(I realize that is probably out of scope of looking over the NOC changes, but it does hinder trying to run the code to look at NOC work)
In addition to having to re-run the tables, most of the time you can't actually re-run the code due to the table already existing in the database (under someone elses IDIR), and the code not recognizing that the individual IDIR tables are different.
02b series - all the NOC logic ran fine, but it was complex to sort out which tables were needed where. As most of the checks don't check for a specific schema, it would say TRUE as long as the table was in someones schema, but then you cannot make updates to the table. Also had some issues with saving files for the same reason - needed to update the code in the load scripts to make sure that the data was in my actual schema, not someone elses.
A lot of the 2016 code was deleted - I'm hoping that the non NOC changes here can be applied to the 2016 version of the code (eg adding the Stats Can work, etc).
I couldn't find all the files to run completely through the 07 section, but looking through the change log for the relevant files, it seems like the logic makes sense. I would need to run through from the beginning with no missing tables to be sure that none of the queries broke, although all the NOC changes seem correct.
It is unclear exactly how multiple surveys (i.e. student outcomes vs census) are incorporated to create a single distribution. Looking at, for example the occupation_distributions
table, there are more than 1 line per NOC code (1 for census, 1 for SO) - how are these handled in the projected distributions? Added? Averaged? It isn't clear.
(Same thing with the labour supply distributions, but with CIPs instead of NOCs I think).
Noting that in 02b-2 and 02b-3 the _QI specific code doesn't work (expects QI version of tables that we don't have).
There are many files in this PR and many of these edits don't directly support conversion to 2021 NOCs.
Priority files to focus on: