biosimulators / Biosimulators_utils

Utilities for building standardized command-line interfaces for biosimulation software packages
https://docs.biosimulators.org/Biosimulators_utils
MIT License
4 stars 6 forks source link

Consolidate descriptions of similar errors and warnings #96

Closed luciansmith closed 2 years ago

luciansmith commented 2 years ago

Currently, the utils function validating SED-ML spams my results with pages and pages of "XPath could not be validated." It's not necessary to tell me 500 times that you can't validate an XPath; you could even not mention it at all, since it is telling me something about biosimulators_utils, and nothing about my file. If you must say something, saying it once would be sufficient.

jonrkarr commented 2 years ago

Good point.

jonrkarr commented 2 years ago

The much verbose messages originate from SBML. This could be presented more compactly as well.

  1. Group occurrences errors/warnings by code
  2. Display list of occurrences
luciansmith commented 2 years ago

I actually just spent a little time doing this, just so I could debug things more sanely ;-) Does this seem reasonable for output:

The SED document is potentially incorrect.
  - Model `BIOMD0000000001_url` may be invalid.
    - The model file `BIOMD0000000001_url.xml` generated warnings.
      - 26 SBML warning(s) with id 99505.  Example message:
        'In situations where a mathematical expression contains literal numbers or parameters whose units have not been declared, it is not possible to verify accurately the consistency of the units in the expression. 
         The units of the <kineticLaw> <math> expression 'comp1 * (kf_0 * B - kr_0 * BL)' cannot be fully checked. Unit consistency reported as either no errors or further unit errors related to this object may not be accurate.'
      - 35 SBML warning(s) with id 80701.  Example message:
        'As a principle of best modeling practice, the units of a <parameter> should be declared rather than be left undefined. Doing so improves the ability of software to check the consistency of units and helps make it easier to detect potential errors in models.
         The <parameter> with the id 'kf_0' does not have a 'units' attribute.'

I was thinking of leaving any error messages alone, even if there were multiples of the same type, on the theory that you'd want to have the specific message for each one.

jonrkarr commented 2 years ago

That's what I had in mind. This package doesn't have distinct error codes, but it would be easy to assign unique codes to each error.

luciansmith commented 2 years ago

OK, I have a fix! I've been making changes so far by using the web site to edit single files, which automatically gives me a patch version I can make a pull request from, but this one involves several files. Would it be possible to add me as a contributor so I can commit directly to a branch and make a pull request from there, instead of from a clone?

jonrkarr commented 2 years ago

I added you to the core developer team. You should have write access to this repo and the other core repos now.

jonrkarr commented 2 years ago

Below are the methods that are used to generate reports of errors/warnings. Basically, there's just a 2 methods in which errors/warnings could be grouped into more succinct reports. These methods are used by the command-line program, individual simulation tools (and in turn runBioSimulations), and REST API/web application

This would be helpful because @bilalshaikh42 is hoping to reduce the size of console logs.

Command-line program

The command-line program prints errors and warnings out at https://github.com/biosimulators/Biosimulators_utils/blob/f8370913679828b6a45dad047123c0ab84a6f43d/biosimulators_utils/__main__.py#L205

This uses the following method. This method is used in several places were errors are printed out. https://github.com/biosimulators/Biosimulators_utils/blob/f8370913679828b6a45dad047123c0ab84a6f43d/biosimulators_utils/utils/core.py#L342

REST API

The REST API uses the method below to generate a report of errors and warnings. The implementation can be changed as long as the interface stays the same (data structure for the output). https://github.com/biosimulations/biosimulations/blob/7e9dfa6093bfa8e636cb11a027e8af754059fcb0/apps/combine-api/src/utils.py#L121

luciansmith commented 2 years ago

Ah, thanks! I only consolidated a couple sub-groups of messages so far, but a general approach seems useful as well. I'll see what I can do.

jonrkarr commented 2 years ago

I suggested this because it avoids dropping information until the end when a human-readable report needs to be generated, plus the compaction can be applied to SED-ML, metadata, manifests, and other model formats as well.

Grouping could be made easier by collecting error codes everywhere. Since this appears in many places, this might be easier for me to do.

luciansmith commented 2 years ago

That makes sense. Many of the error codes could be the SED-ML validation numbers. Libsbml reserves a chunk of numbers >99000 for warnings/errors that are not in the spec, and uses spec numbers elsewhere (just as a possible design).

I do think that 'we don't validate ModelChange XPaths' should only be a single warning from the get-go, though.

jonrkarr commented 2 years ago

That makes sense. Many of the error codes could be the SED-ML validation numbers. Libsbml reserves a chunk of numbers >99000 for warnings/errors that are not in the spec, and uses spec numbers elsewhere (just as a possible design).

Luckily, I don't need to know what the specific codes mean to group them. I just need tuples of the packages which generate them and the package-specific codes to be unique, which they are.

I do think that 'we don't validate ModelChange XPaths' should only be a single warning from the get-go, though.

Agreed. Once you pointed this out, its obvious. This didn't occur to me.

luciansmith commented 2 years ago

Oh, I meant for validation of SED-ML itself, which libsedml doesn't do (yet?) so you validate everything by hand. When adding those errors, you might want to give them IDs too, and if so, I would suggest using spec validation numbers where possible.

jonrkarr commented 2 years ago

Good point. Ideally we'd do that.