APSIMInitiative / ApsimX

ApsimX is the next generation of APSIM
http://www.apsim.info
Other
136 stars 164 forks source link

Updating ISummary to DataStore when exception occurs #8997

Closed lie112 closed 4 months ago

lie112 commented 5 months ago

What happened?

It is difficult to determine the timing of datastore actions. I have some parameter and setup validation (using IValidatableObject methods and Validation Attributes) that runs in OnCommencing and StartOfSimulation and tracks all errors and warnings through the ISummary to the datastore table _Messages. Then the controlling model checks this store prior to running the simulation and throws a fatal exception if any critical issues are raised. The problem is that the database is often not updated by the time the database is checked. So, a simulation can appear to run without issue while errors were written to the database after the check. Likewise, errors can be present upon opening a simulation that were never reported to the user. Also, errors can have been addressed, but still reported. This is confusing and makes it really difficult to address suggested issues and track why the model stopped, or not stopping when it should. I have looked deep into DataStore and feel it needs a public method that allows us to force the outstanding writes to happen so I can call this before I check for errors. I vaguely recall such an approach exists somewhere, but I think it's private. Otherwise, I have to wait for some time during the simulation or post-simulation which is too late to trap validation errors. Also, should the _Messages table be cleared at the start of a simulation as I don't believe it is, or the same issue exists and the database has not been updated when I check? The problem is made worse when working with Experiment-Factors across threads as no error checking happens for some runs where even in a very quick model, the code completes before the database is updated. I don't want to throw an exception on any validation issue, as I want the full list of problems to be presented to the user before causing the exception.

What Operating System are you on?

Windows

par456 commented 4 months ago

In the datastore model, there is a property called Writer, which holds the current datastore writing object. With that you can call WaitForIdle() and it will return only once writing to the datastore has been completed.

datastore.Writer.WaitForIdle()

If you put that into your code where you are waiting for the summary messages to finish writing, then I think it should do what you are looking for.

lie112 commented 4 months ago

Thanks. I have implemented this. Does the _Messages table get cleared when a simulation is run? At lease for the current simulation name. And does this also apply to all the simulations in an experiment? I have just done an experiment run and it threw an error which was trapped and displayed. Upon fixing the error I ran again and it threw the error from data in the datastore without throwing in the validation code. I ran the experiment again and the simulation runs so not getting my error, but still displays all the old messages in the status area. Emptying the datastore results in the simulation running as expected with no errors displayed.

par456 commented 4 months ago

It should be clearing the _messages table each time it runs, it's one of 3 tables that are always supposed to get reset when running a simulation.

I wonder if something in the GUI is holding onto those messages rather than the datastore.

Do you potentially have an example or branch I can look at to see what's happening?

lie112 commented 4 months ago

I'm in between two dev versions at present. The flow is a little tricky to follow, but I think I have a better understanding. From further testing, it seems _messages table is only set to empty upon writing the first messages during a simulation as this is where Initialise is called. As all simulations at least end with a "Complete" message this usually always achieved. However, I was throwing missing file exceptions before any message was written and so it appears that when I go to look at the message table before writing a message, it still contains the previous simulation message data. In checking this I realise my exception throwing is a bit clumsy. During validation everything is written to messages and one final check only throws the exception if errors were detected (thus the need for WaitForIdle), thus terminating the simulation which is the desired action. Other areas throw exception when continuing from the problem is not logical. Others such as "cannot find file" should really be written as a message and caught at the same place validation problems are handled. This requires a bit more code in other places to handle the fact that the models were not set up properly, which was more work than just throwing the exception at the place where the missing file was noted. I'm looking to clean up this issue in future code. I could solve the issue by always writing the error message to Summary before throwing an exception. It seems the APSIM exception handling must write something as the exceptions are remembered next time in so must be in the database. This doesn't make sense if these are written to Messages as they should then clear the messages table.

par456 commented 4 months ago

If you are looking to force a clean because it's running prior to the simulation code doing a db clean, you might be able to force it with the Clean method in the datastore Writer.

From what you're saying, it sounds like the problem is that your code wants to run ahead of the simulation so you can check things have been set up correctly. Is there a particular reason it has to run before simulations are set up instead of in a really early event like start of simulation?

lie112 commented 4 months ago

All my setup and validation are performed between StartOfSimulation and FinalInitialise in three CLEMEvents. So, I am running really early in the run as suggested. You should be able to replicate my situation. Have a look at getting the messages table from DataStore and then throwing an exception in a StartOfSimulation method. You should find the messages are empty or those of the previous run as no message should have been written to reset the data table. Then when you run additional simulations, the messages table should contain your exception error before the next one is thrown if I am correct. So, if you had addressed the exception, you'll still get it reported and I belive sometimes the model stops and others it runs but displays all the errors. Then you could add a message before the exception, and this should clear the massages and give you one message and the error.

par456 commented 4 months ago

Cheers, I'll have a go at that and investigate.

par456 commented 4 months ago

I built a small sim to test this, but whenever I checked the datastore during StartOfSimulation, the _messages table existed, but was always completely empty. Nothing from a previous run was still in there.

I also checked the db file itself and it was completely empty. Something strange seems to happening in your code.

lie112 commented 4 months ago

I think I have isolated the issue. If I

The message is never written to _Messages, and thus if the only message, messages is never cleared or updated for that simulation.

The error message is written to the status bar but not to Summary as an Error and the simuation thread exists before the database is updated so my original message to ISummary is never saved.

In the exception trapping section of APSIM could we write the error message to the _messages table and wait for WaitForIdle before handling and returning to the UI. This will solve all my problems and I won't need to use the WaitForIdle in my code, which causes some status message to popup during experiments.

It is really difficult for me to be working with my simulations at present as I don't know what messages are current. I am using DB Browser to have the database and _Messages table open so I can keep refreshing it there to see the real outcome of the last run.

par456 commented 4 months ago

I just checked that on my end, I put a write to summary and an exception throw into the weather class when it goes to read the daily weather.

image

It threw, but in my summary file I could still see (and it was in my db file):

image

Is there a part of your process here that is doing things concurrently?

I also did this with 10000 messages going to the summary and 10 variations in an experiment and all of the messages were into the summary by the time it threw. Just in case it was a threading problem or if a certain number of messages were needed to cause this to happen.

lie112 commented 4 months ago

I'll keep investigating. It's kind of hard to repeat bad yet happens all the time.

public static void ReportErrors(IModel model, IEnumerable<Logging.Message> messages)'
{
    Debug.WriteLine($"Error count at entry: {messages.Count()}");
    // report error and stop
    if (messages.Any())
    {
        Debug.WriteLine($"Error count after any: {messages.Count()}");
        // create combined inner exception
        StringBuilder innerExceptionString = new StringBuilder();
        foreach (var error in messages)
            innerExceptionString.Append($"{error.Text}{Environment.NewLine}");
        Debug.WriteLine($"Error count before exception: {messages.Count()}");
        Exception innerException = new Exception(innerExceptionString.ToString());
        throw new ApsimXException(model, $"{messages.Count()} error{(messages.Count() == 1 ? "" : "s")} occurred during start up.{Environment.NewLine}See CLEM component [{model.GetType().Name}] Messages tab for details{Environment.NewLine}", innerException);
    }
}

That method reports Error count at entry: 2 Error count after any: 2 Error count before exception: 0

The code is only supposed to report if there are error messages. And it says there are two when it checks Sill two before walking through them But a few lines later when it reports them in the exception there are 0 errors, I think the 0 is correct and the 2 is a hangover from a previous run. I can't see that it's an issue with the IEnumerable but sometime during the method the values in the list passed to the method change.

lie112 commented 4 months ago

Sorry, just realised you checked threading in previous post. But here's an update anyway __

Have you tried your test inside an experiment? I wonder if it's something to do with the experiment threading. I have an exp sim output in db including errors in _Messages. I run the experiment again after fixing the issues in the sim setup and they are still reported reported as errors even though the validation/error code does not get run (breakpoints to check)

  1. the errors are again displayed in the status window
  2. the model runs through the progress bar and completes
  3. There is not "Completed" message as the previous errors remain.

If I empty the datastore and run the experiment again all works as expected with no old errors shown and "Experiment complete [15.19 sec]" displayed.

So, there's definitely something holding onto old data and displaying it before the _Messages database is written to and therefore cleared for the simulation.

lie112 commented 4 months ago

CLEM was checking for reported error messages written to the database in a number of early events. One of these (legacy) was in Commencing to catch a setup error before causing exceptions, but this is too early (as noted in the code comments) as ISummary has not been created at this point. So, something related to this was only reporting messages in the database prior the simulation being successfully started. I have been able to remove the issue by getting rid of this additional validation check and will monitor performance.

lie112 commented 4 months ago

I've spent a day to realise that the database writing and experiments in APSIMX might be quite complicated. One fix was found for why my CLEM error and warning messages weren't always shown. I was using Reader.GetTables to see if any tables had been written, but this ignored the tables beginning with '_' and so a new database with a crash before writing any reports was thought to be a "no simulation run" situation. I have addressed this as now as calling an empty or non-existent table returns null and I don't need to ensure the table (or even database) exists. This must be some added functionalty.

lie112 commented 4 months ago

I have identified part of my issue: If I delete the xxxx.db database (I believe this is also the case for an existing database) Run an experiment In one of the factor simulations I write an error message to ISummary I handle checking the _messages and throw an error in the Validation event of CLEM events The simulation then finishes and displays "ExperimentName complete [2.25 sec]" This infers to the user that all was ok. Upon checking my corrected messages display as part of the CLEM presenter, I am shown the validation error that was thrown for the simulations performed. At the end of an experiment can we check for errors in _messages and display an appropriate warning in the status window to indicate that issues were raised rather than merely trapped and forgotten?


Completed with errors! [2.25 sec] Simulation1.Error: Error message #1 Simulation2.Error: Error message #2 or Completed with warnings! [2.25 sec] or Completed with errors and warnings! [2.25 sec]