OasisLMF / OasisPlatform

Loss modelling platform.
BSD 3-Clause "New" or "Revised" License
42 stars 16 forks source link

Provide clearer status upon input generation results #974

Open fl-ndaq opened 5 months ago

fl-ndaq commented 5 months ago

Issue Description

Current input generation results in a generic status INPUTS_GENERATION_ERROR when it actually does not fail but produces an empty keys.csv, only outputing a message in the log (https://github.com/OasisLMF/OasisLMF/blob/18629544c601fe7a5f73c4ae5664bfd4f3e4e3f5/oasislmf/computation/generate/files.py#L228). A suggested way to better handle this would be to add multiple ERROR states to give clearer feedback, keeping the INPUTS_GENERATION_ERROR status as generic for "unknown errors" and when the input generation actually fails, while introducing more granular statuses in known cases or when the input generation does NOT fail but produces empty keys.csv. (eg handle such error and raise specific INPUTS_GENERATION_NO_KEYS for example)

carlfischerjba commented 5 months ago

Thanks for logging this. Here's an example we just hit. The underlying cause was a model configuration error but it took a while to figure out because nobody thought to check keys.csv as it's often something else. I'm not quite sure where the boundaries lie between OasisLMF, Oasis API and NRMC.

This is with OasisLMF: 1.28.5.

In NRMC, the analysis log panel ends with the logs from our model lookup module, but no error. This is the same as the content of the interim file which in this case is called 86ce4d7ed572440299ed70f15c5e83af.txt.

...
2024-02-09 09:05:48,098 lookup@535:link_events [79:139934882095104] [INFO] Linking event set binary files from model into workspace
2024-02-09 09:05:48,099 lookup@336:process_locations [79:139934882095104] [INFO] Lookup complete in 0:14:10.189148

  0%|          | 0/1 [16:44<?, ?it/s]

The NRMC worker logs contain the same as the above plus four extra lines, but still no error and the final line incorrectly says that input generation has succeeded.

...
2024-02-09 09:05:48,098 lookup@535:link_events [79:139934882095104] [INFO] Linking event set binary files from model into workspace
2024-02-09 09:05:48,099 lookup@336:process_locations [79:139934882095104] [INFO] Lookup complete in 0:14:10.189148

  0%|          | 0/1 [16:44<?, ?it/s]

[2024-02-09 09:05:48,758: INFO/ForkPoolWorker-1] Stored S3: /tmp/tmp2l1v0qjn/86ce4d7ed572440299ed70f15c5e83af.txt -> b5267ef24c3f43ad8e19b1d87012778e.txt
[2024-02-09 09:05:49,688: INFO/ForkPoolWorker-1] Stored S3: /tmp/tmp2l1v0qjn/keys-errors.csv -> 21dec6f972da4cd4ab68db5137f8d072.csv
[2024-02-09 09:06:26,165: INFO/ForkPoolWorker-1] Stored S3: /tmp/tmp2l1v0qjn -> 3aece423b73b47c2b80de9e48c5483a1.tar.gz
[2024-02-09 09:06:26,195: INFO/ForkPoolWorker-1] Task generate_input[01ed8ac7-660a-4503-82e3-f38f7a910d9c] succeeded in 1049.0475860689999s: ('oasis/worker/3aece423b73b47c2b80de9e48c5483a1.tar.gz', 'oasis/worker/21dec6f972da4cd4ab68db5137f8d072.csv', None, None, None, 'oasis/worker/b5267ef24c3f43ad8e19b1d87012778e.txt', 1)

However, when running via the MDK, we get a proper error message and it's a lot easier to figure out the problem. The mmap traceback isn't great but at least further up it tells us keys.csv generated with 0 items. The problem is that to see this error we need to take an analysis from NRMC and replicate it in the MDK which can be quite time-consuming to set up and to run.

...
2024-02-09 10:56:19,794 farmcat.oasis.lookup lookup@341:process_locations [1:140552649265600] [INFO] Lookup complete in 0:15:11.470873
COMPLETED: oasislmf.lookup.factory.generate_key_files in 911.68s

Keys successful: /data/workspace/lm2974_ABC/output/run/input/keys.csv generated with 0 items
Keys errors: /data/workspace/lm2974_ABC/output/run/input/keys-errors.csv generated with 896599 items
2024-02-09 10:56:19,861 oasislmf log@128:wrapper [1:140552649265600] [INFO] COMPLETED: oasislmf.lookup.factory.generate_key_files in 911.68s
2024-02-09 10:56:19,861 oasislmf.computation.base keys@166:run [1:140552649265600] [INFO] 
Keys successful: /data/workspace/lm2974_ABC/output/run/input/keys.csv generated with 0 items
2024-02-09 10:56:19,861 oasislmf.computation.base keys@168:run [1:140552649265600] [INFO] Keys errors: /data/workspace/lm2974_ABC/output/run/input/keys-errors.csv generated with 896599 items
^M  0%|          | 0/3 [15:21<?, ?it/s]
Traceback (most recent call last):
  File "/home/worker/.local/bin/oasislmf", line 8, in <module>
    sys.exit(main())
  File "/home/worker/.local/lib/python3.10/site-packages/oasislmf/cli/root.py", line 54, in main
    sys.exit(RootCmd().run())
  File "/home/worker/.local/lib/python3.10/site-packages/oasislmf/cli/root.py", line 42, in run
    return super(OasisBaseCommand, self).run(args=args)
  File "/home/worker/.local/lib/python3.10/site-packages/argparsetree/cmd.py", line 159, in run
    return cmd_cls(sub_command_name).run(args)
  File "/home/worker/.local/lib/python3.10/site-packages/argparsetree/cmd.py", line 159, in run
    return cmd_cls(sub_command_name).run(args)
  File "/home/worker/.local/lib/python3.10/site-packages/argparsetree/cmd.py", line 161, in run
    return self.action(args) or 0
  File "/home/worker/.local/lib/python3.10/site-packages/oasislmf/cli/command.py", line 142, in action
    manager_method(**_kwargs)
  File "/home/worker/.local/lib/python3.10/site-packages/oasislmf/utils/log.py", line 123, in wrapper
    result = func(*args, **kwargs)
  File "/home/worker/.local/lib/python3.10/site-packages/oasislmf/manager.py", line 90, in interface
    return computation_cls(**kwargs).run()
  File "/home/worker/.local/lib/python3.10/site-packages/oasislmf/computation/run/model.py", line 86, in run
    cmd[0](**cmd[1]).run()
  File "/home/worker/.local/lib/python3.10/site-packages/oasislmf/computation/generate/files.py", line 217, in run
    keys_df = get_dataframe(
  File "/home/worker/.local/lib/python3.10/site-packages/oasislmf/utils/data.py", line 419, in get_dataframe
    df = pd.read_csv(
  File "/home/worker/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/worker/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/worker/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/worker/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
    self.handles = get_handle(
  File "/home/worker/.local/lib/python3.10/site-packages/pandas/io/common.py", line 731, in get_handle
    handle, memory_map, handles = _maybe_memory_map(handle, memory_map)
  File "/home/worker/.local/lib/python3.10/site-packages/pandas/io/common.py", line 1129, in _maybe_memory_map
    mmap.mmap(
ValueError: cannot mmap an empty file
ERROR: OasisLMF produced no results - see log output.
fl-ndaq commented 5 months ago

@carlfischerjba as per our discussion, logs are there , especially in the case of keys.csv being empty. However order of the log on NRMC is merely because of a hack to output the traceback logs. This might be resolved by https://github.com/OasisLMF/OasisPlatform/issues/784, provided that same can be done for v1 traceback logs

sambles commented 2 months ago

Notes on this for oasislmf:

I hit the same mmap error too, PR for it here. https://github.com/OasisLMF/OasisLMF/pull/1510

A better way to do this is adding a check in key server to make sure there is at least 1 key in keys.csv. We then raise a subclassed OasisException something like OasisExceptionNoKeysReturn and catch it here to return the status of INPUTS_GENERATION_NO_KEYS. https://github.com/OasisLMF/OasisPlatform/blob/20b0aa391335a420bf4dcdd70caa8e02b771fb60/src/model_execution_worker/tasks.py#L496-L501