fedora-modularity / libmodulemd

C Library for manipulating module metadata files
MIT License
31 stars 52 forks source link

Empty profile is parsed as a package with an empty name #593

Closed ppisar closed 2 years ago

ppisar commented 2 years ago

Having this document (empty profiles are legal https://docs.fedoraproject.org/en-US/modularity/policies/packaging-guidelines/#_profiles):

document: modulemd
version: 2
data:
    name: test
    stream: A
    summary: Test.
    description: >
        Test.
    license:
        module: [ MIT ]
    profiles:
        profileA:
          rpms:

reading and writing results into:

document: modulemd
version: 2
data:
  name: test
  stream: "A"
  summary: Test.
  description: >
    Test.
  license:
    module:
    - MIT
  profiles:
    profileA:
      rpms:
      -

This is a bug on parser side:

$ cat /tmp/test.py 
#!/usr/bin/python3
import gi
gi.require_version('Modulemd', '2.0')
from gi.repository import Modulemd

module = Modulemd.read_packager_file('/tmp/out')
print("name={}".format(module.get_module_name()))
profile = module.get_profile('profileA')
print("profile={}".format(profile.get_name()))
packages = profile.get_rpms()
for package in packages:
    print("<{}>".format(package))
[test@fedora-37 libmodulemd-devel]$ GI_TYPELIB_PATH=/tmp/build/modulemd LD_LIBRARY_PATH=/tmp/build/modulemd PYTHONPATH=$PWD python3 /tmp/test.py 
name=test
profile=profileA
<>

Possibly all rpms lists are affected.

ppisar commented 2 years ago

The cause is that libyaml returns YAML_SCALAR_EVENT for "" instead of YAML_MAPPING_END_EVENT:

(modulemd-validator:4969): libmodulemd-DEBUG: 16:52:57.284: Parser event: YAML_SCALAR_EVENT: profileA
(modulemd-validator:4969): libmodulemd-DEBUG: 16:52:57.284: TRACE: Entering modulemd_profile_parse_yaml
(modulemd-validator:4969): libmodulemd-DEBUG: 16:52:57.284: Parser event: YAML_MAPPING_START_EVENT
(modulemd-validator:4969): libmodulemd-DEBUG: 16:52:57.284: Parser event: YAML_SCALAR_EVENT: rpms
(modulemd-validator:4969): libmodulemd-DEBUG: 16:52:57.284: Parser event: YAML_SCALAR_EVENT:
(modulemd-validator:4969): libmodulemd-DEBUG: 16:52:57.284: Parsing scalar:   
(modulemd-validator:4969): libmodulemd-DEBUG: 16:52:57.284: Parser event: YAML_MAPPING_END_EVENT
(modulemd-validator:4969): libmodulemd-DEBUG: 16:52:57.284: TRACE: Exiting modulemd_profile_parse_yaml

That's probably in line with YAML. I think that YAML represents empty/undefined/missing mapping value node for a mapping key by an empty scalar. I will have to check YAML specification. Otherwise, it's a bug in libyaml.

Then there is a special handling for this in modulemd_yaml_parse_string_set():

        case YAML_SCALAR_EVENT:
          g_debug ("Parsing scalar: %s",
                   (const gchar *)event.data.scalar.value);
          g_hash_table_add (result,
                            g_strdup ((const gchar *)event.data.scalar.value));

          if (!in_list)
            {
              /* We got a scalar instead of a sequence. Treat it as a list with
               * a single entry
               */
              done = TRUE;
            }
          break;

which is responsible for the bogus empty package name.

sgallagher commented 2 years ago

Your original document is faulty.

  profiles:
    profileA:
      rpms:
      -

Literally says that there is one entry under rpms: and it's an empty scalar.

What you actually want is:

  profiles:
    profileA:
      rpms: []

This will give you a zero-length array, which will behave as you expect. The docs may need to clarify this, but the way it's behaving is as expected.

ppisar commented 2 years ago

My original document is without the trailing hyphen:

  profiles:
    profileA:
      rpms:
sgallagher commented 2 years ago

That's still incorrect, because we expect a YAML_SEQUENCE_START_EVENT here and we get nothing. The libyaml is probably just trying to treat the lack of content here as a zero-length scalar, but that's pretty much just trying to avoid aborting on an invalid syntax.

ppisar commented 2 years ago

I'm not sure whether it's invalid syntax. E.g. Perl YAML::XS is fine with it and reports the missing value as undef:

$ perl -MYAML::XS -MData::Dumper -e 'print Data::Dumper::Dumper(YAML::XS::LoadFile(q{/tmp/out}))'
$VAR1 = {
          'document' => 'modulemd',
          'data' => {
                      'summary' => 'Test.',
                      'stream' => 'A',
                      'license' => {
                                     'module' => [
                                                   'MIT'
                                                 ]
                                   },
                      'profiles' => {
                                      'profileA' => {
                                                      'rpms' => undef
                                                    }
                                    },
                      'description' => 'Test.
',
                      'name' => 'test'
                    },
          'version' => 2
        };
sgallagher commented 2 years ago

OK, https://yaml.org/spec/1.2.2/#72-empty-nodes says that they are technically valid and are to be interpreted as a zero-length scalar, which is commonly treated as synonymous with a "null" value.

It's still a violation of the libmodulemd specification which requires a sequence here, but I suppose we can be overly-cautious and catch this unusual case. If we get a zero-length scalar here, just treat it as a zero-length array of rpm names.