biocompute-objects / BCO_Documentation

Repository for documentation to support the IEEE 2791-2020 standard. Please see our home page for communications/publications:
http://biocomputeobject.org/
BSD 3-Clause "New" or "Revised" License
16 stars 12 forks source link

Refine SCM extension keys #39

Closed HadleyKing closed 5 years ago

HadleyKing commented 6 years ago

When opening issue #21 @stain said:

Given that these are URLs I think the extension should support any source control repository, not just github.com, perhaps something like:

"extension_domain":{
  "scm_extension": {
    "scm_repository": "https://github.com/example/repo1",
    "scm_type": "git",
    "scm_branch": "c9ffea0b60fa3bcf8e138af7c99ca141a6b8fb21",
    "scm_path": "workflow/hive-viral-mutation-detection.cwl",
    "scm_preview": "https://github.com/example/repo1/blob/c9ffea0b60fa3bcf8e138af7c99ca141a6b8fb21/workflow/hive-viral-mutation-detection.cwl"
}

Here's how Maven defines it's scm metadata.

As we have now changed the extension to scm_extension I felt the discussion should be continued on another thread. The wording in extension-scm.md has been updated and the antiviral_resistance_detectionTypeDef.json has been as well, but only on the most superficial level. Each of the fields are simply described as string.

Should we have a more comprehensive definition here?

corburn commented 6 years ago

Specify commit or revision instead of branch.

I would avoid referring to source code exclusively by branch. Both branch and tag are pointers to a commit hash. In old GIthub issues, for example, links to source code will sometimes point to segments unrelated to the issue. The code has changed and the branch will refer to the current state. A commit will always refer to a specific state.

https://github.com/biocompute-objects/BCO_Specification/blob/9a49d708d57cd436f6ab009faca64ef71d83b649/HCV1a.json#L2

stain commented 6 years ago

The proposed text looks good, except that it does not explain what the scm_* keys mean.

I think it's important to say that scm_path should NOT start with /,

As @corburn points out instead of branch we want to encourage pointing to some kind of commit. However I would not restrict it to always be by commit id as it would make it impossible for a BCO to refer to resources inside its own repository, but we can have that as a recommendation. So then scm_branch (or scm_commit if you want) should be named to support both.

Perhaps:

I would define that predefined values for scm_type include git (Git, including GitHub/GitLab), svn (Subversion) hg (mercurial) but that third-party scm types can also be used. Those prefixes would then happen to match https://maven.apache.org/scm/scms-overview.html but we should not link deep as they have some 'weird ones' as well that could get confusing.