alan-turing-institute / uatk-spc

Synthetic Population Catalyst
https://alan-turing-institute.github.io/uatk-spc/
MIT License
20 stars 12 forks source link

Change sic1d07 to letters #53

Open HSalat opened 1 year ago

HSalat commented 1 year ago

The population data loaded by SPC has letters for sic1d07, while the business registry has numbers (A = 1, B = 2, etc.). This means that there is a conversion happening somewhere during the commuting modelling. The new business registry will have a letter directly. The code needs to be adapted accordingly.

HSalat commented 1 year ago

For ref, the new businessRegistry.csv.gz.

Note that Ids have changed slightly.

sgreenbury commented 1 year ago

As you mention the letters run from A-U it's probably easiest to use a char instead of the current u32. We could also add a type Sic1d2007 = char; for clarity. Happy to implement this.

HSalat commented 1 year ago

@dabreegster is best placed to confirm which parts of the code should be changed

dabreegster commented 1 year ago

What exactly is the ask here? The output proto is already a letter: https://github.com/alan-turing-institute/uatk-spc/blob/108794bc2057f7ff51cd48da2740d6ef3c194d23/synthpop.proto#L115 Is businessRegistry.csv changing? If so, then a small amount of code changes internally. See https://github.com/alan-turing-institute/uatk-spc/blob/108794bc2057f7ff51cd48da2740d6ef3c194d23/src/init/commuting.rs#L75 and https://github.com/alan-turing-institute/uatk-spc/blob/108794bc2057f7ff51cd48da2740d6ef3c194d23/src/init/commuting.rs#L274

Edit: I didn't read clearly -- so the registry csv is changing. Should be a very easy change in this code

sgreenbury commented 1 year ago

Initial modifications (9d145dd), yet to be tested on the updated file.