conversion to 2021 NOC - Githubissues

bcgov / post-secondary-supply-model

A repository to house the code base for Post-Secondary Supply Model (PSSM)

Apache License 2.0

0 stars 0 forks source link

conversion to 2021 NOC #29

Open BonnieJRobert opened 1 month ago

BonnieJRobert commented 1 month ago

There are many files in this PR and many of these edits don't directly support conversion to 2021 NOCs.

Priority files to focus on:

02b series: 02b-1-pssm-cohorts.R, 02b-2-pssm-cohorts-new-labour-supply.R, 02b-3-pssm-cohorts-occupation-distributions.R
07-occupation-projections.R
load-cohort-appso.R
load-cohort-dacso.R
load-occupation-projections.R
any files in sql/

lindsay-fredrick commented 1 month ago

load-cohort-appso & associated sql: looks fine, very small change to sql only.
load-cohort-dacso & associated sql: looks fine, the infoware load now runs much smoother which is nice. I'm assuming the load of the new noc table is usable in the rest of the workflow, otherwise no big changes of note.

lindsay-fredrick commented 1 month ago

load-occupation-projections: there are no changes in this file in the PR, and it only seems to load in data from the LAN to the sql server, so there isn't much to comment on here.

lindsay-fredrick commented 1 month ago

Perhaps it's the documentation you are working on for the team, but it's unclear exactly which load-... files need to be run before starting the 02b-x series. The checks on data existing are good, but with the tenuous system of each person relying on their own IDIR schema for the data, which sometimes gets wiped out, it can be difficult to recall what exactly needs to be re-loaded.

To avoid issues with idirs being wiped, once the data has been loaded to SQL, it should really be moved to dbo, and then we should directly query off that table instead of having copies floating in our own IDIRs.

(I realize that is probably out of scope of looking over the NOC changes, but it does hinder trying to run the code to look at NOC work)

In addition to having to re-run the tables, most of the time you can't actually re-run the code due to the table already existing in the database (under someone elses IDIR), and the code not recognizing that the individual IDIR tables are different.

lindsay-fredrick commented 1 month ago

02b series - all the NOC logic ran fine, but it was complex to sort out which tables were needed where. As most of the checks don't check for a specific schema, it would say TRUE as long as the table was in someones schema, but then you cannot make updates to the table. Also had some issues with saving files for the same reason - needed to update the code in the load scripts to make sure that the data was in my actual schema, not someone elses.

A lot of the 2016 code was deleted - I'm hoping that the non NOC changes here can be applied to the 2016 version of the code (eg adding the Stats Can work, etc).

lindsay-fredrick commented 1 month ago

I couldn't find all the files to run completely through the 07 section, but looking through the change log for the relevant files, it seems like the logic makes sense. I would need to run through from the beginning with no missing tables to be sure that none of the queries broke, although all the NOC changes seem correct.

lindsay-fredrick commented 1 month ago

It is unclear exactly how multiple surveys (i.e. student outcomes vs census) are incorporated to create a single distribution. Looking at, for example the occupation_distributions table, there are more than 1 line per NOC code (1 for census, 1 for SO) - how are these handled in the projected distributions? Added? Averaged? It isn't clear.

(Same thing with the labour supply distributions, but with CIPs instead of NOCs I think).

aclowery commented 1 month ago

Noting that in 02b-2 and 02b-3 the _QI specific code doesn't work (expects QI version of tables that we don't have).