Green-Software-Foundation / if

Impact Framework
https://if.greensoftware.foundation/
MIT License
157 stars 41 forks source link

👍🏼 Update IF to dump logs and errors to yaml file #592

Closed jmcook1186 closed 6 months ago

jmcook1186 commented 7 months ago

Sub of: #629 -> #650

What

Update IF so that errors and logs are saved to file as well as being displayed in the console.

Why

As a core developer I want to be able to run negative tests, which requires errors and logs to be captured in a file we can snapshot against. As a user I want to be able to provide a log file as part of a bug report to help the core team guide me through debugging.

Context

Testing negative flows is important for us to make sure IF behaves as intended across a large set of scenarios. It is hard to do this without dumping errors and logs to files because the alternative is for IF to panic without any persistent result to test against. This can be resolved by capturing the errors and logs to a file.

SoW

Acceptance criteria

GIVEN the user is in a folder WHEN they run this manifest file:

name: basic-error-demo
description:
tags:
initialize:
  plugins:
    teads-curve: 
      path: '@grnsft/if-unofficial-plugins'
      method: TeadsCurve
      global-config:
        interpolation: spline
  outputs:
    - yaml
tree:
  children:
    child-0:
      defaults:
        cpu/thermal-design-power: 100
      pipeline:
        - teads-curve
      inputs:
        - timestamp: 2023-07-06T00:00
          cpu/utilization: 20

THEN the tool logs the error, exits and outputs a manifest file which contains these additional nodes:

name: basic-error-demo
description:
tags:
initialize:
  plugins:
    teads-curve: 
      path: '@grnsft/if-unofficial-plugins'
      method: TeadsCurve
      global-config:
        interpolation: spline
  outputs:
    - yaml
execution:
  status: fail
  error: 'InputValidationError:   "duration" parameter is required. Error code: invalid_type'.
tree:
  children:
    child-0:
      defaults:
        cpu/thermal-design-power: 100
      pipeline:
        - teads-curve
      inputs:
        - timestamp: 2023-07-06T00:00
          cpu/utilization: 20

The errors and logs should still be displayed in the console as usual, because the version int he manifest file will not contain the full stack trace.

QA

jawache commented 7 months ago

@jmcook1186 this one is good to go from me.

narekhovhannisyan commented 7 months ago

@jmcook1186 @jawache it's not clear output, logs and the errors should be stored in different files?

jawache commented 7 months ago

@narekhovhannisyan just anything that caused the manifest to fail and if to exit (critical error) not all the normal info, debug logs.

For negative testing the manifest file should contain the error so we can do both negative and positive testing with the if-verify tool.

Thinking about it the node should just be 'execution.error' instead of log since it will only contain the error that caused it to fail. Stringified exception that caused failure.

zanete commented 6 months ago

@MariamKhalatova in order to complete the acceptance criteria for this issue and get it ready for development, there are a couple of asks from @jawache to you in the description. Can you let me if they are clear to you and you can help get the needed information? This is an urgent issue that is blocking other work in the #650 feature 🙏 cc @narekhovhannisyan

jmcook1186 commented 6 months ago

@zanete FYI I untagged @MariamKhalatova and added a failing manifest and expected output manifest to the ticket description

narekhovhannisyan commented 6 months ago

@jawache looks good, however, this doesn't mean that we are not logging the error in the console yes? since error stack is not shown in manifest

jmcook1186 commented 6 months ago

I think we should also log to the console @narekhovhannisyan

jawache commented 6 months ago

agreed @jmcook1186 and @narekhovhannisyan, also log to the console.

zanete commented 6 months ago

This needs some major restructuring of the index file to properly handle things, currently we just catch the error and print to the console, but we need to output in the manifest, and in order to do that we need to continue the execution of the manifest. Some of the errors may cause some variables to be undefined and that needs to be solved.

jawache commented 6 months ago

@narekhovhannisyan (assume that came from you 😄) isn't this as simple as this?

load Inputmanifestfile Try All our compute steps. . Catch exception which we can't recover from Add exception to inputmanifest, save, exit

zanete commented 6 months ago

@jawache ys sorry for the lack of context, forgot to add that I was taking notes during standup and as @jmcook1186 wasn't there, didn't want to miss important technical points made.

narekhovhannisyan commented 6 months ago

@jawache

const impactEngine = async () => {
  try {
  const options = parseArgs();

  if (!options) {
    return;
  }

  logger.info(DISCLAIMER_MESSAGE);
  const {inputPath, paramPath, outputOptions} = options;

  const {tree, rawContext, parameters} = await load(inputPath!, paramPath);
  const context = await injectEnvironment(rawContext);
  parameterize.combine(context.params, parameters);
  const pluginStorage = await initalize(context.initialize.plugins);
  const computedTree = await compute(tree, {context, pluginStorage});
  const aggregatedTree = aggregate(computedTree, context.aggregation);
  context['if-version'] = packageJson.version;
  exhaust(aggregatedTree, context, outputOptions);

  return;
  } catch (e) {
     // here you have to collect all the data before the error happened.
     // f.ex. if failure happened on aggregation state, you have to save all the job done before that and write in file
     // however in that state, all the variables are inaccessible in catch block

     // or we have to keep only the initial version of the yaml and inject error message
  }
};
jmcook1186 commented 6 months ago

I'm not sure I'm following either - is this complication arising because the manifest is loaded into memory inside the try block? Meaning the catch doesn't have access to it?

So a fix would be to separate the call to load from the try/catch block that computes the manifest?

In either case, the intended outcome is only to catch the error message and insert it into the original manifest file as follows:

execution:
  status: fail
  error: 'InputValidationError:   "duration" parameter is required. Error code: invalid_type'.

I'm not seeing why this needs to access any other system state. Might be worth a call to explain @narekhovhannisyan ?

jawache commented 6 months ago

Yes exactly just have

const {tree, rawContext, parameters} = await load(inputPath!, paramPath);

Outside of the try block, if an exception is thrown loading a Yaml file then can just log and exit, no need to add that exception it to the Yaml file.

narekhovhannisyan commented 6 months ago

@jawache if you don't want to keep yaml file in latest shape when writing down yeah that will work. F.ex. you error occurred in aggregation state, you want to save yaml file in computed state or just the initial version with error?

jawache commented 6 months ago

@narekhovhannisyan just in it's initial state is fine (plus exception & runtime info like versions from #591)

If you want to know hints re: state just prior to failure we can go to the logs and see what we logged (after we finish of #600)

And also, the way we're defining the functionality of the if-diff if there are extra tree nodes from a partial run, it would trigger a difference that is unnecessary to see for testing purposes. So just the initial state and the exception plus runtime is all we need for now.

MariamKhalatova commented 6 months ago

@jawache @jmcook1186 @narekhovhannisyan In the testing scope of this feature, came across with following behaviour: When manifest structure is invalid (f.ex. initalize->plugin->method is missing), then error is thrown in console, without saving output manifest. Is it the behaviour we expect?

jmcook1186 commented 6 months ago

Oh interesting - nice catch @MariamKhalatova . We should capture these errors in a manifest file too. @narekhovhannisyan

narekhovhannisyan commented 6 months ago

@MariamKhalatova @jmcook1186 for handling that we need to separate loading and validation lifecycles, and move the validation under the try catch block

jawache commented 6 months ago

@narekhovhannisyan and @MariamKhalatova generally validation errors that are so bad the file can't even be loaded are not important to include, you might be passing in a jpeg instead of a yaml or something else just as silly. If the input file can't even be parsed as a yaml file, then there is little chance we can add in an exceptions node so we just log to console.

Specifically from your example though @MariamKhalatova we do want to include errors raised when the plugin can't initialize due to bad config, that's a common problem.

@narekhovhannisyan to capture that why can't the try block be moved to just the line below? That's before plugin initialization. (In my mind this whole exercise should be very simple now we have refactored everything in some pure functional architecture, it's mostly a decision as to where we put the try keyword)

const impactEngine = async () => {

  const options = parseArgs();

  if (!options) {
    return;
  }

  logger.info(DISCLAIMER_MESSAGE);
  const {inputPath, paramPath, outputOptions} = options;

  const {tree, rawContext, parameters} = await load(inputPath!, paramPath);

  try {

  const context = await injectEnvironment(rawContext);
  parameterize.combine(context.params, parameters);
  const pluginStorage = await initalize(context.initialize.plugins);
  const computedTree = await compute(tree, {context, pluginStorage});
  const aggregatedTree = aggregate(computedTree, context.aggregation);
  context['if-version'] = packageJson.version;
  exhaust(aggregatedTree, context, outputOptions);

  return;
  } catch (e) {
     // here you have to collect all the data before the error happened.
     // f.ex. if failure happened on aggregation state, you have to save all the job done before that and write in file
     // however in that state, all the variables are inaccessible in catch block

     // or we have to keep only the initial version of the yaml and inject error message
  }
};
narekhovhannisyan commented 6 months ago

@jawache Yeah, we discussed a case where the problem is not because of loading, but validating the structure. currently, load and validation are taking place in one function. So separating them into two different functions will solve the problem and it will be possible to move it to try/catch block

jawache commented 6 months ago

@narekhovhannisyan oh I see, so the raw validation of the manifest file structure is happening in const {tree, rawContext, parameters} = await load(inputPath!, paramPath); along side just loading the file?

narekhovhannisyan commented 6 months ago

@jawache yeah, separating the validation from it will solve the problem

jmcook1186 commented 6 months ago

Docs partially addressed here https://github.com/Green-Software-Foundation/if-docs/pull/71

Also requires a PR to if-docs to update https://if.greensoftware.foundation/major-concepts/manifest-file with a new section titled execution where status and error fields can be explained.

narekhovhannisyan commented 6 months ago

@jmcook1186 https://github.com/Green-Software-Foundation/if-docs/pull/72