RMI-PACTA / workflow.portfolio.parsing

Other
0 stars 0 forks source link

Add checks for common portfolio issues #23

Open AlexAxthelm opened 9 months ago

AlexAxthelm commented 9 months ago

Add informative checks (that can be raised in warnings, not errors) on the portfolios that can be displayed to the user.

Possible checks:

@cjyetman Do we have any of these checks already? would pacta.data.validation be a better place for this issue to live?

cc @hodie

cjyetman commented 9 months ago
  • Total market value of portfolio is 0

no, I don't consider any valid numerical value being unambiguously wrong see below

  • Portfolio has negative value holdings

again, I don't consider any valid numerical value being unambiguously wrong, but... pacta.portfolio.audit flags < 0 market values as invalid input here https://github.com/RMI-PACTA/pacta.portfolio.audit/blob/main/R/check_valid_input_value.R

There have been many conversations about what to do with negative holdings. Some people want them included somehow, others want to drop them as soon as possible. I think it would be wrong to drop negative values during the portfolio import/parsing step. Let the audit process decide what should be done with them so at least there's that option.

see below

  • Column Types unexpected (numeric is string, or vice versa)

For a CSV, this doesn't make much sense because it can't explicitly set column types. But, pacta.portfolio.import does read in what it determines to be the market_value column as a string to guess the decimal and grouping mark, and then coerces that column to numeric, which means that values that R cannot coerce to numeric (given the decimal and grouping mark) are coerced to NA. This is a bit of a deviation from my general philosophy (see below), but it is necessary because it is very common for portfolios to include non-numeric values in the market_value column, e.g. "Cash" or "-", and it would be counter-productive to force the column to be a character vector to accommodate those values.

The other columns, which should all be strings, are coerced to character vectors, which means if a column only has numeric values, it will retain the proper value but it will be a character, e.g. "1" not 1.

  • Currency code not parsed

pacta.portfolio.import has is_valid_currency_code() but I don't believe it is currently used anywhere other than get_csv_specs()

see below

  • ISIN not in ISIN format (checksum)

pacta.portfolio.import has is_valid_isin() which is also used in pacta.portfolio.audit to set a flag

see below


In general, my philosophy with importing a portfolio is: fix things that can be unambiguously interpreted so that it can be imported correctly, but do not question the content. The content is better assessed at a different point in the analysis where the code can decide how to deal with it and/or notify the user. Otherwise, you end up in situations where the "input" into the analysis is not aligned with what the user uploaded, and that can be very difficult to disentangle. The imported portfolio should genuinely reflect what the user uploaded.

AlexAxthelm commented 9 months ago

Thanks @cjyetman All of those points are useful in the context of pacta.portfolio.import, but what I'm aiming for here is to try and protect end-users from themselves (again, with warnings not errors), in a situation where they are probably not trying to push the limits of what PACTA does.

In that context, I think it's reasonable to inform the user that they have (for example) uploaded a portfolio with no market value, and that it won't generate a PACTA report. If they want to continue, then that's their decision, but we've done the (quick and easy) first checks on "hey, is this actually what you want to do?"

cjyetman commented 9 months ago

The functions I mentioned are available in pacta.portfolio.import which is already being used here, e.g. if you wanted to use them post-import to flag something to the user. The column type of market_value is not possible to be anything but numeric if you're using pacta.portfolio.import to import it, but a post-import check of the column type would be very easy to do, e.g. class(portfolio$market_value). Likewise for the negative or 0 market_values.

cjyetman commented 9 months ago

pacta.data.validation also has some ISIN validating functions. It does not currently have functions to directly validate currency codes (it does have functions for validating our "currencies" dataset, but that's different). It does not currently have a function to validate directly the column type of market_value. Negative or 0 value market_values are not strictly "invalid", though they may be worth a warning to a user, so I would be hesitant to add that to pacta.data.validation.

AlexAxthelm commented 9 months ago

Noted, thanks.

hodie commented 9 months ago

Not sure where you two ended up here. But I'd agree that there are things that are "warning"-shaped and I think they go into the metadata for a CSV but wouldn't make the import fail or throw an error. These are likely to be of interest of users for a sanity check, but might be irrelevant for some users. Something like AUM < 1000 is a good example of this.

And then there are true unambiguous errors. Those I would think should result in the portfolio upload failing/no portfolio resulting from the upload.

Are we saying that only the latter should be in .import and the former would be in .validation? I don't have an opinion on this, but I do have an opinion on the user-facing experience: The user should get feedback on both without having to take additional actions (e.g., click on "view audit").

AlexAxthelm commented 9 months ago

I think the consensus we landed at is: