aboutcode-org / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
35 stars 21 forks source link

Fixing DebianDirectoryIndexVisitor in minecode/visitors/debian #23

Closed 35C4n0r closed 1 year ago

35C4n0r commented 1 year ago

To get the Name, Version & Arch. from a debian file name we use get_nva function ( https://github.com/nexB/debian-inspector/blob/main/src/debian_inspector/package.py#L113 ) this leads to two problems:

  1. The version returned from get_nva is a Version Object and we cannot directly pass it to the PackageURL version, see https://github.com/nexB/purldb/blob/main/minecode/visitors/debian.py#L117
  2. Also the IndexVisitor function tries to generate purls for description file (this will lead to error as get_nva do not handles .dsc files), generating URI is enough for them.
  3. https://docs.python.org/3/library/gzip.html#gzip.open quotes that The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode. The default is 'rb', since we later need do operation on this content ( see https://github.com/nexB/purldb/blob/main/minecode/visitors/debian.py#L89 ) we need to either use 'rt' or decode("utf-8").
35C4n0r commented 1 year ago

@pombredanne @JonoYang any suggestions ?

JonoYang commented 1 year ago

@35C4n0r

  1. I would try passing the string representation of the version object when creating the PackageURL (version = str(version))
  2. @pombredanne Should we skip processing .dsc files?
  3. We can update the code to use rt as the arguments to gzip.open()
35C4n0r commented 1 year ago

@JonoYang Instead of skipping, how about if we don't generate purls for them and just return the uri objects for them without purl.

35C4n0r commented 1 year ago

@JonoYang, the problem is not only with .dsc files but with all the files which are Collectible: https://github.com/nexB/purldb/blob/main/minecode/visitors/debian.py#L101 But are not recognized by debutils https://github.com/nexB/debian-inspector/blob/main/src/debian_inspector/package.py#L132 also should isCollectible function recognize .xz files and InRelease files?