ManageIQ / manageiq

ManageIQ Open-Source Management Platform
https://manageiq.org
Apache License 2.0
1.34k stars 898 forks source link

Remove normalize methods from the refresh #3307

Closed Ladas closed 9 years ago

Ladas commented 9 years ago

Remove the normalize methods and rather have refresh failed, when there is unexpected format of the value.

used e.g in:

https://github.com/ManageIQ/manageiq/blob/3b3efe05e5382804aa2fe9e57404933481482a8c/vmdb/app/models/ems_refresh/parsers/openstack_infra.rb#L351

matthewd commented 9 years ago

We already have a fair bit of warn-and-continue setup at higher levels, which will prevent an API misadventure from spreading too far. Ultimately, if a disk's size suddenly becomes... say, a Net::HTTPRequest, or a hash... I think it's much safer to just give up, and skip adding that disk. If it's so far outside of its contract, we should quite possibly give up on the entire endpoint / provider.

Guessing what a completely unexpected data value might mean is way beyond the job description of this code, and for good reason: trying to guess at a near-enough-is-good-enough recovery strategy from so far down the stack, with such little context, is a recipe for future bugs, and ongoing debugging pain. (I've already developed Many Opinions Indeed on the existing error recovery mechanisms we have around refresh, and their talent for turning an actual, descriptive, easily-resolved error into "expected count to be 5 but got 0".)

So, mostly:

make sure the refresh will not fail

:-1: -- if the provider is returning invalid data, the refresh absolutely should fail.

Ladas commented 9 years ago

Alright, I will just remove the normalize_methods and let it fail. Although since it's free form metadata mostly, we might tune it up a bit later.

I'll change the description of this issue to removing this stuff. Should be removed in scvmm too, so we have at least some consistence of how the refresh should be handled. :-)

blomquisg commented 9 years ago

So, I get this type of question/request from field engineers, etc., all the time: "Can't we just skip this failing thing and continue with the refresh?"

My knee jerk reaction is typically, "No, we can't." Because we don't really know what it means to skip some piece of data in the refresh. We scoped and implemented the refreshes the way they're setup today because we have a certain expectation of the inventory data.

If we start making changes to our assumptions regarding the data and our plasticity in dealing with changing (or, as usually the question, missing) data, then we have two potential problems:

  1. We have reevaluate our assumptions on the inventory data and what's required at each level of dependence in the data (i.e., Is it OK to skip a Host when we later have to attempt to refresh a VM that lives on that host? Or, is it better to skip all things that are either associated with or entirely dependent on that Host?).
  2. We would have to understand the future implications of our inventory fail-and-continue methodology. With our current assumption of "fail-and-bail", adding new types of inventory data is simple. Because we already have assumptions along the lines, "if the inventory collection got this far, then everything is still OK and I can depend on the previously collected data being present." However, if we adopt fail-and-continue inventory collection logic, then we have to go back and pepper all the inventory collection code with, "it's possible I skipped something in the past, if I skipped something that this new object depends on, should I skip this new object, too?"
Fryguy commented 9 years ago

I merged #3322, so is this closed?

On the discussion here, we sort of already have a bit of play in the vmware provider. We generally allow a single VM to fail without failing the whole refresh. However if some general piece of information is missing (like the whole "config" portion of the Hash), then we fail the whole refresh.

There is also a lot of code there like:

result[:version] = inv["version"] unless inv["version"].blank?

where if the inventory is missing data, we just leave it (notice we don't even nil it out, we just straight up ignore changes to it).

Ladas commented 9 years ago

I guess I can close this.

Although similar normalize methods are present also in scvmm.

The partial missing data should be ok, I think, having just the important part throwing error should be ok. Though I use fetch path, which nil out the value. Keeping the value if it's not there anymore is probably a bug.

I am closing it, let me know if I should reopen it for some reason.