broadinstitute / barclay

Command line argument parser and online documentation generation utilities for java command line programs.
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

Plugin descriptors with several instances with different arguments #138

Open magicDGS opened 6 years ago

magicDGS commented 6 years ago

The normal use-case for plugin descriptors in GATK is to have a single instance of each class to apply to data, and provide twice the same class is not allowed (e.g., read/variant filters). Nevertheless, there are some cases that it makes sense to have two different instances for the same class to apply to the data with different arguments, which I do not find the way to implement with the current system.

As an example, let's say that we would like to implement a different ReadFilter plugin from GATK and a filter called IntegerTagReadFilter with two arugments: --int-filter-tag (String) and --int-filter-tag-value (Integer). Thus, the user should be able to provide the following command line: --myReadFilter IntegerTagReadFilter --int-filter-tag NM --int-filter-tag-value 2 --myReadFilter IntegerTagReadFilter --int-filter-tag AS --int-filter-tag-value 3.

This can be implemented as a single filter with the arguments being specified as List<String> and List<Integer> and follow the same implementation as GATK's ReadFilter. Nevertheless, this is not desirable in this case because we would like to keep every instance separated to be able to count the number of reads in each filter.

This is just a toy example, but it does not look that the plugin system allows this kind of implementations. Maybe I am missing something about it...

cmnbroad commented 6 years ago

@magicDGS In addition to plugin framework changes, I think this would require parser changes. Currently the parser assumes all command line argument names are unique and independent (in the sense that command line order doesn't matter). This would require some kind of name qualification or grouping mechanism so that --int-filter-tag NM --int-filter-tag-value 2 would target one instance of a class, and --int-filter-tag AS --int-filter-tag-value 3 would target a different instance.

magicDGS commented 6 years ago

I see the problem with that, and I don't have any concrete solution. An idea is use plugin as tagged arguments or something similar. For example, --myReadFilter IntegerTagReadFilter:NM=2 --myReadFilter IntegerTagReadFilter:AS=3. This might require to change how plugins are handled and provide a way to populate @Argument from tag-like strings and show them like that in the cli help.

cmnbroad commented 6 years ago

If its mostly about the summary counts, it might be easier to explore ways to allow read filters to have custom counting and summary display behavior.

magicDGS commented 6 years ago

@cmnbroad - I set that as an example, but I am implementing other plugins to compute statistics from reads. As a simple example, counting separately the number of reads with NM=2, NM=3 and AS=4 per window; as I said, I can always implement with List arguments, but I rather prefer to have an instance of each of them and a common implementation of a simple "counter" for a tag-value pair.