galaxy-iuc / standards

Documentation for standards and best practices from the Galaxy IUC
http://galaxy-iuc-standards.readthedocs.io/en/latest/
6 stars 16 forks source link

tool id's should have some kind of standard #10

Closed bgruening closed 8 years ago

bgruening commented 9 years ago

Tool ids should be unique and meaningful. We need them to reference tools in a several places. There are tools with whitespaces inside of tool id's, like EMBOSS or tools with case sensitive id's.

We should define a more standard interface for tools id's.

peterjc commented 9 years ago

That all sounds reasonable - is there an easy way to dump all the tool ids currently in the Tool Shed?

bgruening commented 9 years ago

@peterjc please go to the Tool Shed and click on Tools on the left side.

nsoranzo commented 9 years ago

Ping @jmchilton about the tool id validation in Planemo.

peterjc commented 9 years ago

Allowing lower case letters, underscore, and 0 to 9, there are currently 600 "bad" ids on the main Tool Shed:

['ALaGiFer_1', 'ARTS', 'ARTSscore', 'Add_a_column1', 'Annotation visualization', 'Annotation_Expr', 'Annotation_Profiler_0', 'Annotation_RefSeq', 'Annovar', 'AnnovarShed', 'Autocorrelation', 'Autocovariance', 'BAMTools_bamToFastX', 'BAM_Editor', 'BCF Tools Cat', 'BCF Tools Index', 'BECorrelation', 'BED File Converter1', 'BLaGiFer_2', 'BaseAlignCounts', 'BedToWig', 'BestSubsetsRegression1', 'CAPS2gff', 'CAPS_Marker_Design_2', 'CLaGiFer_3', 'CONVERTER_SMILES_to_MOL', 'CONVERTER_SMILES_to_MOL2', ..., 'writeResToHTML']

Allowing upper case too drops this to 66 bad ids,

['Annotation visualization', 'BCF Tools Cat', 'BCF Tools Index', 'BED File Converter1', 'Condense characters1', 'Convert characters1', 'Extract genomic DNA 1', 'Featured datasets4', 'Fetch Taxonomic Ranks', 'Find peaks', 'GVF Features Extracter_1', 'PR Curve', 'ROC Curve', 'ROC-PR Curve', 'Remove beginning1', 'Remove ending', 'Rotating object', 'Show beginning1', 'Show tail1', 'SnpEff-cds-report', 'abyss-pe', 'chip-cluster', 'com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene', 'com.github.lindenb.jvarkit.tools.misc.VcfFilterSequenceOntology', 'com.github.lindenb.jvarkit.tools.misc.VcfHead', 'com.github.lindenb.jvarkit.tools.misc.VcfTail', 'com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS', 'com.github.lindenb.jvarkit.tools.vcftrios.VCFTrios', 'common unique', 'compute_p-values_correlation_coefficients_featureA_featureB_occurrences_between_two_datasets_using_discrete_wavelet_transfom', 'compute_p-values_correlation_coefficients_feature_occurrences_between_two_datasets_using_discrete_wavelet_transfom', 'compute_p-values_max_variances_feature_occurrences_in_one_dataset_using_discrete_wavelet_transfom', 'compute_p-values_second_moments_feature_occurrences_between_two_datasets_using_discrete_wavelet_transfom', 'ctb_np-likeness-calculator', 'deseq-hts', 'deseq2-hts', 'dexseq-hts', 'edu.tamu.cpt.gsaf.download', 'fa-extract-sequence', 'fasta-stats', 'flexbar2.3', 'gap-rem', 'gfap_r1.0_allvar_genomic_annotater', 'gfap_r1.0_cdsvar_functional_annotater', 'gfap_r1.0_known_var_finder', 'gfap_r1.0_samvcf_data_parser', 'gff to bed wiggle', 'glimmer_build-icm', 'glimmer_knowlegde-based', 'glimmer_not-knowlegde-based', 'hammock_1.0', 'htseq-count', 'iReport-dev', 'mgescan-ltr', 'mgescan-nonltr', 'ngs-tools_merge_fna_qual', 'ngs-tools_sample', 'ngs-tools_split_by_barcode', 'ngs.plot', 'ngs.plot_intro', 'nupop_0.1', 'rdiff-web', 'samtools-filter', 'samtools-merge', 'sparql query executor', 'tagres-train']

Quite a lot of hyphen/minus signs, a few dots/periods, and a few spaces too :(

hexylena commented 9 years ago

I've always been fond of java package style tool names, but I'd be fine with this proposal.

peterjc commented 9 years ago

At a minimum IDs with spaces really should be discouraged (even a planemo lint error?)

hexylena commented 9 years ago

I'd say bare minimum:

And maybe an INFO/lower level stylistic warning for using underscores.

jmchilton commented 9 years ago

Spaces here - https://github.com/galaxyproject/planemo/pull/190 - they are a usability problem in some contexts.

I'll add more once we agree.

bgruening commented 9 years ago

We are still planning and developing a prototype of gsh a Galaxy Shell. We need good tool names to get tools running, so would like to see more _ and less ..

hexylena commented 9 years ago

I don't see .s as a problem in command names in the shell, sure they're non-traditional, but that's just my opinion. If _ make it easier for you, I'm happy to be more strict.

The rationale I use for using java style names is I can give my tools very organised IDs.

E.g.

edu.tamu.cpt.gbk.merge
edu.tamu.cpt.fasta.2paircat
edu.tamu.cpt.fasta.alignment.logo
...