broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
958 stars 366 forks source link

Next phase of command line parser transition #1120

Open cmnbroad opened 6 years ago

cmnbroad commented 6 years ago

The next phase of transitioning Picard to use Barclay, posix-style command line syntax (as previously discussed in https://github.com/broadinstitute/picard/issues/786) is to switch Picard to use the new style by default, leaving the legacy parser in place as opt-in.

Six months has elapsed since the initial switch to the dependency on the parser(s) in Barclay. The proposal is to move to the next phase (Barclay by default, legacy as opt-in) on 4/5, about 6 weeks from today. Comments/questions/discussions welcome.

tfenne commented 6 years ago

@cmnbroad I don't see any issue with this. It would be helpful to:

  1. Understand what the command line syntax will look like for swtiching old/new. E.g. will it be an arg that comes before the program name (java -jar picard.jar [--posix|--legacy] SortSam). Or will it be a build-time choice?
  2. Assuming it's a runtime choice, it would be nice to roll out the mechanism for choosing before switching the default, to give users time to lock in their short-term choice
  3. Understand what Barclay support

For (3) it would be really nice to have a page like the one for sopt or clap to give two examples. I imagine this would be helpful for GATK users too!

cmnbroad commented 6 years ago

@tfenne Some of this is described in https://github.com/broadinstitute/picard/wiki/Barclay-Transition-Notes, though thats focused on the developer transition. A more user-focused one is a good idea.

The new syntax is available now by opt-in, and has been since 2.10.8. Its is being used by GATK 4.0 users to run Picard tools from GATK. Its currently set via a system property:

java -jar -Dpicard.useLegacyParser=false picard.jar ...

The 4/2 change would be to switch the default, and require the property to be set to true to use the old syntax.

splaisan commented 5 years ago

First of all THANKS for the great tools, I cannot work without them anymore.

Said that, are we going somewhere with this? I am very frustrated to have to review my code periodically in order to match the changing way of providing arguments to Picard (and recently to GATK too).

Please come to a consensus between both Broad groups and give to your users a unified syntax (whichever I do not care) allowing writing commands in similar ways wether we run a Picard or a GATK command. We use both and have to constantly avoid making errors and re-reading the help.

eg:

Personally, I have no problem being verbose with --input instead of all other short alternatives I= or -I if this can make it nicer to parse for you.

If you want what your end-users wish and comply, you may consider opening a poll!!

Thanks in advance

PS: I agree with tfenne, a simple option string would be nicer than the super-long typo-prone '-Dpicard.useLegacyParser=false'

cmnbroad commented 5 years ago

@splasian Sorry this is causing you problems - I do understand the interim situation is confusing. Having said that, we do have agreement on a unified syntax for Picard and GATK (it's the syntax used currently by GATK4). The current Picard syntax that uses I= is deprecated, and will be removed in an upcoming release.

The best thing to do now is to completely switch over to using the unified syntax. Its the default in GATK, but requires the property mentioned above for opt-in when using Picard (edit: this actually can't be set in the shell as an environment variable). Once you do that, then you can always use - and -- interchangeably, and you should no longer have to worry about making code work with both syntaxes. When Picard no longer supports the legacy syntax (this a taking longer than I'd hoped, but I expect it will be soon), the upgrade will be transparent to you. This was the whole reason for having an interim time period where both syntaxes were supported.

The issue of the UPPERCASE argument names vs snake-case (Picard uses the former, GATK the latter) is another confusing issue, but one that is independent of the parser being used.

Hope this helps - if you do have any problems with using the new syntax, please file a ticker with the specific issue and I'll try to address it as quickly as possible.

splaisan commented 5 years ago

Thanks a lot @cmnbroad, I look forward to have nice and readable scripts with both toolboxes under the same sky. How do you do (that can also be set as an environment variable) ? with some alias? (could you share an example)

Best Stephane

cmnbroad commented 5 years ago

@splasisan GATK4 includes all of the Picard tools out of the box (try ./gatk --list) - you don't need to do anything to opt in. When using a Picard tool from the GATK4 installation, you use the same command line syntax that GATK uses (-I arg instead of I=arg). The argument names for the Picard tools are still the same as in Picard though (all caps, etc).

splaisan commented 5 years ago

This is what confused me, I ran sometimes the Picard and sometimes the GATK version and mixed up the argument syntax. Thanks @cmnbroad @splaisan ;-)

kbergin commented 5 years ago

@cmnbroad Is this work complete? Doing some ticket cleanup

cmnbroad commented 5 years ago

@kbergin No, this ticket needs to remain open as we still haven't made this change. I've communicated at length with @jkaneria about what needs to happen to get this resolved.

cmnbroad commented 5 years ago

Updating with some more issues that would be resolved by moving forward with this (some of these are closed even though they're not fully resolved):

https://github.com/broadinstitute/picard/issues/1211 https://github.com/broadinstitute/picard/issues/1009 https://github.com/broadinstitute/picard/issues/950 https://github.com/broadinstitute/picard/issues/1196

splaisan commented 4 years ago

Are their plans to do the long awaited transition any soon? Picard commands keep giving recommandations about the 'new' syntax but that syntax does not work (for me with Picard version: 2.21.2-SNAPSHOT)

eg

java -jar $PICARD/picard.jar \ > SortSam \ > I=${outfolder}/${outpfx}_rawmappings.bam \ > O=${outfolder}/${outpfx}_rawmappings_qrysrt.bam \ > SO=queryname \ > MAX_RECORDS_IN_RAM=10000000 \ > TMP_DIR=./ INFO 2019-12-10 16:17:31 SortSam ** NOTE: Picard's command line syntax is changing. ** ** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition) ** ** The command line looks like this in the new syntax: ** ** SortSam -I bwa_mappings/HG001_chr22_rawmappings.bam -O bwa_mappings/HG001_chr22_rawmappings_qrysrt.bam -SO queryname -MAX_RECORDS_IN_RAM 10000000 -TMP_DIR ./ **

cmnbroad commented 4 years ago

@splaisan There is a PR for the next step, which is a change to make the parser auto-detect which syntax you're using here. It is just waiting for additional testing, and will hopefully be available in a release soon.

kbergin commented 4 years ago

Hi all,

We expect to work on the production testing early in the new year. Apologies for the long wait.

gbggrant commented 3 years ago

Hi @cmnbroad we believe that the work for this ticket is de facto done, since dual parser support kind of overrides this. Do you think the default should be changed elsewhere?