VisionEval / VisionEval-Dev

Development version of VisionEval framework
https://visioneval.github.io/
Apache License 2.0
6 stars 32 forks source link

Currency data can't be handled as compound data type #19

Open dflynn-volpe opened 5 years ago

dflynn-volpe commented 5 years ago

Originally created by @gregorbj in April 2018:

The framework automatically deflates the values of currency data to specified years or the base year. This only works for currency data that is not specified as a compound data type (e.g USD/MI). It is imperative that all currency data be specified as currency type and not as compound type in order to insure that values are deflated appropriately. This is bound to cause some confusion for module developers, so the framework should be modified to deflate currency values that are in a compound data type. Meanwhile the developer documentation needs to explain this.

jrawbits commented 2 years ago

We should look at how the deflation is handled for a singular currency type and perhaps apply that conversion to compound types. But it's unclear where that may be a problem: perhaps only when processing Input files?

if we do a computation within a module that uses a currency type, surely the currency is retrieved from the Datastore in deflated units (independent of the year of input, based on deflators.csv and model Years when building the Datastore) and everything is preserved so that subsequent measures based on currency will always be reported in base year dollars?

So I think it's important for module processes always to work in "Univeral Currency Coordinated" units (i.e. deflated base year currency value) and not work in some alternate year.

One of our economist users needs to explain why that won't work. For reporting output metrics, it might be useful to "reflate" currency to a consistent future year basis, but I think people are pretty much used to interpreting charts that refer to "2020 dollars" e.g.

gregorbj commented 2 years ago

It would be a mistake and would be unnecessary to establish universal currency coordinated units. It would be a mistake because: 1) It would force all users to and all developers to adopt a standardized base year; 2) It would require the revision of several modules and the datasets they use (e.g. VEHouseholdTravel, VEHouseholdTravelMM, VE2001NHTS, VE2009NHTS) to be consistent with the standard year, 3) It would increase errors if users have to manually deflate all input dollar values to the standard year values and inflate output dollar values to the desired reporting year (i.e. base year) values. It is unneccessary because: 1) The current approach works. The only constraint it imposes is that currency values can't be used in compound units such as USD/MI. 2) Developers can work around the constraint fairly easily. There are examples in existing module code. The value of the suggested modification would be to enable some units to be expressed more naturally. 3) It would be much easier to make this modification than to rewrite the framework code and modules to use a standardized currency year.

Note that there are no future year dollars since VE going back to GreenSTEP days has always worked in real dollars. The purposes of deflators are 1) to convert nominal currency input values to base year values and 2) to convert base year values to the currency year values required by different modules. Each module determines in its 'Get' specifications what currency year the values need to be deflated to. This approach is important in order to accommodate new or revised modules that incorporate models estimated with different year data. For example, the original VE household travel model was estimated from the 2001 NHTS dataset. On the other hand, the MM household travel model was estimated from the 2009 NHTS dataset. Users can use either the original model or the MM model without making any changes to their currency input data. Moreover, no changes need to be made to other modules that use or produce currency data (e.g. the PredictIncome module). Conversion to the correct currency year is handled automatically by the framework based on the 'UNITS' 'Get' specification for the dataset. For example the 'UNITS' specification for 'Income' in the original 'CalculateHouseholdDvmt' module is "USD.2001" whereas the specification in the MM 'CalculateHouseholdDvmt' module is "USD.2009". This approach implements the loose coupling goal of the VE model system. Module developers can work with different model estimation datasets and that doesn't need to concern model users. Module developers only need to specify the year in the appropriate module specifications. Module users only need to supply a set a deflators and the base year.

I don't think it would take much time to revise the framework code to handle currency in compound units. I just ran out of time previously and could more quickly just work around the problem using non-compound units in order to get modules completed on schedule. I do think it is important to resolve this issue by modifying the framework code to accommodate compound currency units because it will enable developers to express units more naturally and would reduce the potential for conversion errors.

jrawbits commented 2 years ago

Thanks for the input, Brian. Obviously, we can't get to the "one true Base Year" for all time and in all places.

My proposal would be better described as saying that the Datastore will always store currency values deflated to the model's base year (so setting the Base Year for a model run defines the what needs to happen to currency values submitted for any other year).

Any module internally that needs to be estimated off different years will need to define (and implement) some adjustment from the model base year to its estimation year (and back again) using deflators provided by the user (with an error if none such exist) so that the currency retrieved from the Datastore is suitably adjusted.

That is all a somewhat separate issue from the compound units, but I'm hanging up on the basic problem of how, in all these cases, we recognize what the "working year" should be (i.e. when and how do we decide that deflating is required). If my understanding of "deflating" is correct, we would just apply the same factors to any "currency per other unit" as we would to the currency itself - and if the other unit is converted, attach it accordingly. I'm still hung up on how we decide what year we need to adjust to.

Probably that means I should study up on modules that use year-based data in ways that need to be deflated.

gregorbj commented 2 years ago

This is what the framework does now. Currency is stored in the datastore in base year values. If a module requires currency to be deflated to a particular year, it specifies that in the appropriate 'Get' specification. The framework does the required deflation knowing the base year and using the user supplied deflators. If a module outputs a currency dataset, it specifies the currency year and the framework does the conversion to base year values.

For example, here is the 'Get' spec for per capita income in the PredictIncome module. Notice the required year of the data. That is because this model was estimated using 2000 census microdata which records 1999 income.

item(
      NAME =
        items(
          "HHIncomePC",          "GQIncomePC"),
      TABLE = "Azone",
      GROUP = "Year",
      TYPE = "currency",
      UNITS = "USD.1999",
      PROHIBIT = c("NA", "< 0"),
      ISELEMENTOF = ""
    )

Here is the 'Set' spec for the predicted household income produced by the module:

item(
      NAME = "Income",
      TABLE = "Household",
      GROUP = "Year",
      TYPE = "currency",
      UNITS = "USD.1999",
      NAVALUE = -1,
      PROHIBIT = c("NA", "< 0"),
      ISELEMENTOF = "",
      SIZE = 0,
      DESCRIPTION = "Total annual household (non-qroup & group quarters) income"
    )

The framework automatically does all the required conversions because the specifications identify the year, the user has specified the base year (in run_parameters.json), and the users has provided deflators (in deflators.csv).

All we need to do is make these conversions work with currency expressed in compound units. This is not critical, but could help reduce mistakes resulting from misunderstandings. For example the following is a 'Get' spec from the 'CalculateVehicleOperatingCost' module:

item(
      NAME = items(
        "FuelCost",
        "PowerCost"),
      TABLE = "Azone",
      GROUP = "Year",
      TYPE = "currency",
      UNITS = "USD.2010",
      PROHIBIT = c("NA", "< 0"),
      ISELEMENTOF = ""
    )

The model was estimated using 2010 year data so the 'UNITS' specification is "USD.2010". But notice that the units are "USD" and not "USD/GAL" for FuelCost and not "USD/KWH" for PowerCost. These are what the what the data in the datastore really represent but because the automatic conversions can't operate on compound units, it is not possible to express the units in the correct way.

jrawbits commented 2 years ago

@gregorbj Thanks for the clarification! That makes the problem seem considerably less difficult - the deflation can probably ignore the denominator, we just need to recognize it, detach it, and then restore it. Even easier than other compound unit conversions...

jrawbits commented 2 years ago

After trying out the solution above (which basically just lets you specify YEAR and MULTIPLIER as part of the UNITS, without writing them separately; not a bad thing, but easily worked around by specifying YEAR and MULTIPLIER explicitly in a specification), I realized (surprise!) that the problem goes deeper.

In logical fact, "deflateCurrency" is just a species of "convertUnits" - we're converting e.g. from USD.2018 to USD.2010 using the deflator rather than fixed definitions of the relationship (as e.g. in converting MI to KM).

So we need to move deflateCurrency inside "convertUnits" by changing fromUnits and toUnits to be structures that include the Units string and the corresponding Year string. The Year becomes another component in the internal From_ls and To_ls structures inside convertUnits. Then when we get inside the function, check for "currency" type around line units.r::126 and if so, do the deflateCurrency block and return.

Also need to break out "currency" when we separate a "compound" expression.

jrawbits commented 2 years ago

I have a working implementation of compound currency types; I need to add some additional tests to it. It should be pushed either in beta-0.8 or beta-0.9. I'll comment and close when it's done.

jrawbits commented 1 year ago

Double check and ensure this fix is in place for the R 4.2.2 release.