aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/aboutcode-org/scancode-toolkit/releases/
2.08k stars 539 forks source link

Allow a plugin to provide more than one option #787

Closed pombredanne closed 6 years ago

pombredanne commented 6 years ago

We should have a way for a plugin to provide not one, but several CLI options Also a plugin should be able to provide fully configured click Options as options and not be constrained in what the options args are.

This is needed whenever more than one option is required or when more advanced settings need to be provided for a given option. This would be needed for scan plugins where several have multiple options.

@yashdsaraf ping :)

yashdsaraf commented 6 years ago

@pombredanne Currently each pre-scan plugin works in a way that if an option such as --ignore abc is passed, any plugin in the pre-scan entrypoint with the entry name ignore is initialized with the user input abc. If a plugin provides more than one option, what do you think would be a better approach?

yashdsaraf commented 6 years ago

Also, should we add a common plugin prefix to the options? say if ignore plugin defines 2 options glob and regex, do we add the plugin name such as --ignore-glob and --ignore-regex or we let the plugins practice complete autonomy in the naming scheme?

pombredanne commented 6 years ago

@yashdsaraf Thanks... here is my thinking on this:

  1. We should generalize the notion of plugin to decouple the UI (e.g. the CLI option name) from the actual declaration (as an entry point). This would mean that every plugin would have the same structure and could contribute or more options as they please and define these options entirely.

  2. With this generalization a plugin becomes something that receives an iterable of resources and returns an iterable of resources. We should therefore better specify this resource object. In the future a plugin could express interest only in some resource that match certain criteria too.

With this, the only thing that a plugin would declare in the entry point is its "name" and its plugin callable for a given stage of the scan (pre, scan, post, output). All options would be provided by the plugin class itself. And all plugins would have the same interface: act on a stream of Resources.

This would mean reworking the way all the plugins work today to a single common way. When they are invoked is the only thing that would be special to a given plugin. This would IMHO pave nicely the support for scan plugins (and later the support for plugins that could be written in another language than Python)

Does this make sense?

yashdsaraf commented 6 years ago

Okay, this is what I got from that

  1. All the plugins will have a similar structure, i.e
    • They all take an iterable of resources as input and return the same after processing
    • They all define their own click option(s)
    • Only structural difference would be -- at what stage of a scan is a plugin called
  2. The Resource class will need some modifications to be handled by all the plugins

I think it's better to start restructuring the plugins to take in iterable of resource, rest points will follow automatically.

pombredanne commented 6 years ago

perfect!

yashdsaraf commented 6 years ago

@pombredanne Like you said, ultimately all the plugins would return an iterable of resources. In this case, how would the output plugins work?

pombredanne commented 6 years ago

Good point: the output plugins could eitherr:

The later is likely cleaner

yashdsaraf commented 6 years ago

I'm sorry I didn't get that, could you give an example of what should happen if a user calls scancode cli with multiple format options.

pombredanne commented 6 years ago

See the discussion in #789

yashdsaraf commented 6 years ago

Ah, got it. Each option supplied by the format plugins would take the output file path as input. Also, what happens if a path is not given? Do we print the output to stdout or simply abort due to the missing param?

pombredanne commented 6 years ago

@yashdsaraf no path given would mean stdout output which would be messy if there are more than one ; ) but that's OK, we cannot prevent folks to shoot themselves in the foot

pombredanne commented 6 years ago

@yashdsaraf I think what would be most useful for @haikoschol is to have this available for post-scan plugins right now. Do you think you can have a look into this? Or shall I take a stab?

pombredanne commented 6 years ago

Shoot, in working to encapsulate Resources with PyFilesystem, I did hit https://github.com/PyFilesystem/pyfilesystem2/issues/120 and #688 .... sigh. I need to beef things up to make things work :|

pombredanne commented 6 years ago

FYI I submitted a PR to https://github.com/PyFilesystem/pyfilesystem2/pull/121 in order to have proper handling of non-fs decoable bytes-only paths. Based on this I am going forward with the refactoring of the Resource and scan pipeline around PyFilesystem2 which should provide a clean abstraction and simplify vastly creating new plugins

pombredanne commented 6 years ago

Merged in develop with #885 !