MoseleyBioinformaticsLab / mwtab

The mwtab package is a Python library that facilitates reading and writing files in mwTab format used by the Metabolomics Workbench for archival of Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) experimental data.
http://mwtab.readthedocs.io
BSD 3-Clause Clear License
12 stars 2 forks source link

Validation Code Error #7

Open ptth222 opened 1 year ago

ptth222 commented 1 year ago

I am using the following code to validate a manually built mwTab JSON: validated_file, errors = mwtab.validator.validate_file(mwtabfile, verbose=True)

It produced an error like:

Error Log:
SCHEMA: Section "COLLECTION" does not match the allowed schema. dictionary changed size during iteration

This is obviously not a proper schema error and is a code error. I tracked it down to the validate_section_schema function which is trying to edit a dictionary while looping over it which results in the "RuntimeError: dictionary changed size during iteration".

This is an obvious problem that needs to be fixed, but there is also another question. Should "validate" be editing the object? In my opinion "validation" should not edit whatever is being validated, it should simply report all problems.

hunter-moseley commented 1 year ago

This is a serious issue, but I want to look at it more closely before fixing it. I roughly remember Christian mentioning this and we discussed it.

On Mon, Nov 14, 2022 at 3:44 PM ptth222 @.***> wrote:

I am using the following code to validate a manually built mwTab JSON: validated_file, errors = mwtab.validator.validate_file(mwtabfile, verbose=True)

It produced an error like:

Error Log: SCHEMA: Section "COLLECTION" does not match the allowed schema. dictionary changed size during iteration

This is obviously not a proper schema error and is a code error. I tracked it down to the validate_section_schema function which is trying to edit a dictionary while looping over it which results in the "RuntimeError: dictionary changed size during iteration".

This is an obvious problem that needs to be fixed, but there is also another question. Should "validate" be editing the object? In my opinion "validation" should not edit whatever is being validated, it should simply report all problems.

— Reply to this email directly, view it on GitHub https://github.com/MoseleyBioinformaticsLab/mwtab/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEP7B4BDIBEKANM5OL44KLWIKQBNANCNFSM6AAAAAASAHA3KI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Hunter Moseley, Ph.D. -- Univ. of Kentucky Professor, Dept. of Molec. & Cell. Biochemistry / Markey Cancer Center / Institute for Biomedical Informatics / UK Superfund Research Center Not just a scientist, but a fencer as well. My foil is sharp, but my mind sharper still.

Email: @. (work) @. (personal) Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax) Web: http://bioinformatics.cesb.uky.edu/ Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093

ptth222 commented 1 year ago

I meant to do this right after lab meeting and forgot, but the offending function is in validator.py at line 227 the del line causing the error is 245. https://github.com/MoseleyBioinformaticsLab/mwtab/blob/master/mwtab/validator.py

ptth222 commented 1 year ago

After some discussion we decided that there was supposed to be 2 modes in validation, one that doesn't clean and one that does. Currently, the 2 different modes aren't supported, but the offending del line was to enable some cleaning. If we want to add a cleaning and non-cleaning validation it needs to be propagated through more code than just this function. A quick fix would be to add a "cleaning" parameter and if True then do cleaning. I have the proposed fix below. Since cleaning will have a default value of False we don't even need to worry about replacing everywhere it is called.

def validate_section_schema(section, schema, section_key, cleaning=False):
    """Validate section of ``mwTab`` formatted file.
    :param section: Section of :class:`~mwtab.mwtab.MWTabFile`.
    :type section: :py:class:`collections.OrderedDict`
    :param schema: Schema definition.
    :type schema: :py:class:`~schema.schema`
    :param str section_key: Section key.
    :return: Validated section.
    :rtype: :py:class:`collections.OrderedDict`
    """
    schema_errors = list()

    keys_to_delete = []
    if section_key in ITEM_SECTIONS:
        for key in section.keys():
            if not section[key]:
                schema_errors.append("{}: Contains item \"{}\" with null value.".format(section_key, key))
                keys_to_delete.append(key)

    if cleaning:
        for key in keys_to_delete:
            del section[key]

    return schema.validate(section), schema_errors