bcgov / BGC_WNA_model

Apache License 2.0
1 stars 0 forks source link

Review BGC unit exclusions #3

Open CeresBarros opened 4 months ago

CeresBarros commented 4 months ago

At the moment, we are excluding a pre-defined set of BGC units (badbgcs below) that include low sample units, outlier units and units that have a "too broad" climate space and get are then overpredicted.

We should revisit the units we are excluding and for the low sample units automate the selection process. For instance, you can see below that not all low sample units (where the low sample cut off is at 50 points) are being excluded:

## train data was built by crossing a 2Km point grid across WNA with the BGC WNA map
> BGC_counts <- trainData[, .(Num = .N), by = .(BGC)] 
> badbgcs <- c("BWBSvk", "ICHmc1a", "MHun", "SBSun", "ESSFun", "SWBvk", "MSdm3", "ESSFdc3", "IDFdxx_WY", "MSabS", "FGff", "JPWmk_WY" )

> BGCcounts
          BGC   Num
       <char> <int>
  1: CCHun_CA 10582
  2: CVGdm_CA 33467
  3: JPWxh_CA  8544
  4: CDFxm_CA  1943
  5: CMXdm_OR  4314
 ---               
373:   BWBSnm  7387
374:   SWBvks   339
375:   BWBSvk   190
376:    SWBvk   363
377:     MHun    10

> BGC_counts[BGC %in% badbgcs]
          BGC   Num
       <char> <int>
 1: IDFdxx_WY  1860
 2:  JPWmk_WY  1127
 3:      FGff  4815
 4:     MSabS  1662
 5:     MSdm3   283
 6:   ESSFdc3   315
 7:   ICHmc1a    51
 8:    ESSFun  1662
 9:     SBSun   753
10:    BWBSvk   190
11:     SWBvk   363
12:      MHun    10

> BGC_counts[Num < 50]
         BGC   Num
      <char> <int>
1: ESSFxh_WA    48
2:   ESSFdcp    27
3:   ESSFxcp    30
4:    IDFww1    30
5:   ESSFxvw    14
6:     CMAwh     6
7:    SBSdh2    49
8:      MHun    10
> all(BGC_counts[Num < 50]$BGC %in% badbgcs)
[1] FALSE