TNC-NMFO / NWLAND

carbon accounting model
0 stars 0 forks source link

write_caland_inputs "fix.by(by.y, y)" #107

Open sbassett opened 2 years ago

sbassett commented 2 years ago

No idea where this originates,

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
sbassett commented 2 years ago

Working through write_caland_inputs line by line. Made it to line 1970, still going without error. Error hits somewhere in the code chunk between line 1970 and 2340... Will step through that section line by line. Have made it to line 2204. Looks like error originates on line 2305 (m=15, forest_man_ind =13)

    else { 
        # merge the parameter table from param_df_list with the initial areas and assign to out_table
        out_table = merge(out_scen_df_list[[1]], param_df_list[[in_index]], by = mergeby, all.x = TRUE)
sbassett commented 2 years ago
> head(out_scen_df_list[[1]])
  Land_Cat_ID Region Land_Type   Ownership Area_ha
1      100002 C08001     Water         DOD    0.18
2      100003 C08001     Water         FWS   82.08
3      100004 C08001     Water       Local 1004.49
4      100006 C08001     Water     Private 1640.07
5      100007 C08001     Water State_Other    3.60
6      100008 C08001     Water State_Trust   22.05

> head(param_df_list[[in_index]])
  Land_Type    Management SoilCaccum_frac
1 Grassland Low_frequency            0.94
2 Grassland Med_frequency            0.77
3   Savanna Low_frequency            0.94
4   Savanna Med_frequency            0.77
5  Woodland Low_frequency            0.94
6  Woodland Med_frequency            0.77

> mergeby
[1] "Region"    "Land_Type"

Region is "missing" from param_df_list[[in_index]]

sbassett commented 2 years ago

This may be an error introduced in the integration with the WLIC branch. Will want to check if the original branches (CALAND and CALAND+WLIC) have "Region" in them.

This appears to be from the lc_params file.

The WLIC branch appears to have used a new format for this sheet in the lc_params file: NWLAND\raw_data\lc_params_2020_12_14.xls (grass_manage sheet)

aj1s commented 2 years ago

Our successful test run last summer was using the nwland_dev branch, not nwland_dev_wlic - that right?

aj1s commented 2 years ago

In attempting to write new inputs following fb8d1e948f3ea23a764ef56c728b2a0e24436d07, both runscript_writeInputs_CO.r and runscript_writeInputs_NM.r return this same error:

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column

@sbassett I'm not tracking details above you wrote in January. Do you have any time to have a look ?

sbassett commented 2 years ago

It looks like the standard deviation column name was altered Screenshot_20220513-212439.png

sbassett commented 2 years ago

At least two of the stocks csv have the swapped column name.

aj1s commented 2 years ago

@sbassett Thank you, hawkeye ... Probably moving faster than optimal in preparing files, but have 2am in my head. I'll rename these and see if that does it.

aj1s commented 2 years ago

Sorry to report that didn't (in itself) resolve this error.

aj1s commented 2 years ago

Changing the stddev values for Developed_all in down-dead (DDC) and litter (LTC) from minuscule values to zero just in case they are still too high precision for the model. Also pasting all new stocks (values-only) into corresponding stock csvs that worked two days ago, just in case I missed something w/ formatting.

5701ea1e24783384bd283a2ade8a092ac76e9048 did not resolve the error. I'm at a loss. Nothing else appears to have changed since my May 10 commits, after which point runscript_writeInputs_CO.r and runscript_writeInputs_NM.r worked without this hitch.

I'm initiating the nuclear option, as it were. In the meantime will review the most recent stock csvs a fourth time.

sbassett commented 2 years ago

extra zero in the stocks_ddc_co.csv line 2 (end of row) seems to add extra column when parsed as CSV.

aj1s commented 2 years ago

harpy eagle, maybe .... thanks.

There's also an extra zero in stocks_ltc_co.csv in the same place. Thanks. Nuclear option running now for CO, but I'll make these tweaks (and clear columns after [sum_abs] for anything hidden) and see if they allow 'runscript_writeInputs_NM.r' to work. If so, I'll restart CO with new stock data.

aj1s commented 2 years ago

I'l commit the tweaks above I guess, but they didn't resolve this issue (with 'runscript_writeInputs_NM.r' ).
'runscript_CALAND_CO_solely85_unc.r' is still running on the nuclear option, ie., inputs from two days ago with Developed_all zeroed out in input_co_rcp85_2022_04_02.csv under and only under tabs ddc_2010 and ltc_2020.

aj1s commented 2 years ago

Evidence is suggesting this error doesn't stem from the latest stock data (at least not in and of themselves). I tried running 'runscript_writeInputs_NM.R' again after swapping in the stock data used in the successful run earlier this week, and was met with the same error.

sbassett commented 2 years ago

Tracing this through write_caland_inputs with the current arguments from runscript_writeInputs_co.R.

Error triggered between lines 1574 and 1835.

> i
[1] 7
> p
[1] 3

Error triggers on line 1592.

>   out_table = merge(out_scen_df_list[[1]], in_c_df_list[[val_index]], by.x = c("Land_Cat_ID"), by.y = c("zone"), all.x = TRUE)
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column

Investigating the dataframes being merged.

> head(out_scen_df_list[[1]])
  Land_Cat_ID Region Land_Type   Ownership Area_ha
1      100002 C08001     Water         DOD    0.18
2      100003 C08001     Water         FWS   82.08
3      100004 C08001     Water       Local 1004.49
4      100006 C08001     Water     Private 1640.07
5      100007 C08001     Water State_Other    3.60
6      100008 C08001     Water State_Trust   22.05
> head(in_c_df_list[[val_index]])
  ï..zone label non_null null_cells      min      max    range     mean mean_of_abs    stddev  variance coeff_var         sum
1  100002    NA        2          0  0.00000 90.72800 90.72800 21.46149    21.46149 17.106820 292.64330  79.70939    42.92297
2  100003    NA      912          0  0.00000 23.05935 23.05935 20.96304    20.96304  6.629096  43.94492  31.62278  3228.30860
3  100004    NA    11161          0  0.00000 23.05935 23.05935 21.55027    21.55027  5.697225  32.45837  26.43691 31355.63615
4  100006    NA    18223          0  0.00000 23.44096 23.44096 19.60714    19.60714  8.032204  64.51630  40.96571 28724.45886
5  100007    NA       40          0 17.98337 17.98337  0.00000 17.98337    17.98337  0.000000   0.00000   0.00000    17.98337
6  100008    NA      245          0  0.00000 23.44096 23.44096 21.07805    21.07805  6.474262  41.91607  30.71566  1222.52701
      sum_abs
1    42.92297
2  3228.30860
3 31355.63615
4 28724.45886
5    17.98337
6  1222.52701

Culprit appears to be the zone field.

Web search returns this info: https://stackoverflow.com/questions/24568056/rs-read-csv-prepending-1st-column-name-with-junk-text

It appears to be an encoding artifact.

aj1s commented 2 years ago

Thank you for the skillful birddogging !

I recall mistakenly exporting as Unicode text at some point instead of Text (Tab delimited). Realizing this, I thought I overwrote the former with the latter, but apparently I didn't. Ugh.

I need to get a few minutes of fresh air, but will work on stripping the artifuact once back.

sbassett commented 2 years ago

Resaving file from excel didn't appear to work. A few other workarounds for changing the encoding of a CSV here: https://stackoverflow.com/a/52799891

sbassett commented 2 years ago

Gracias!

aj1s commented 2 years ago

I'm still not sure why this occurred (as noted above): "I tried running 'runscript_writeInputs_NM.R' again after swapping in the stock data used in the successful run earlier this week, and was met with the same error." However, will plow ahead after fresh air stint, beginning with the Notepad++ suggestion. (Thanks for the SO link )

aj1s commented 2 years ago

Ok, so suspense won over sanity, and ... Encoding w/ Notepadd++ did the trick ! Thank you !! e972203f989bd2a0346e048fef02b2925bdcb442