Closed HadleyKing closed 2 years ago
Using BioCompute's pre-defined fields and standards, knowledgebases can generate a BioCompute object to document the metadata, quality-control and integration pipelines developed for different workflows. BCO's can be generated via a user-friendly instance of a BCO editor and can be maintained and shared through versioned stable IDs stored under a single domain of that knowledgebase. BCO's not only provides complete transparency to its data submitters (authors, curators, other databases, etc), collaborators and users but also provides an efficient mechanism to reproduce the complete workflow through the information stored in different domains (such as description, execution, I/O, error, etc.) in machine and human-readable formats.
@HadleyKing Let me know if you need more text.
@Rahi13 this should be a good start. I want to leave it open for now though so others can comment if they have ideas
@HadleyKing You can also add an example link https://data.glygen.org/DSBCO_000038/v-1.4.5 of one of the BCO's generated by GlyGen.
Jonathon to write a markdown for using knowledgebase BCOs in this repo.
Use the following as a guide: https://github.com/biocompute-objects/extension_domain/tree/main/dataset
Rahi's text is fantastic.
Adding a few minor tweaks and building on it:
Using BioCompute's pre-defined fields and standards, knowledgebases can generate a BioCompute Object (BCO) to document the metadata, quality-control, and integration pipelines developed for different workflows. BCOs can be used to document each release. The structured data in a BCO makes it very easy to identify changes between releases (including changes to the curation/data processing pipeline, attribution to curators, or datasets processed), or revert to previous releases.
BCOs can be generated via a user-friendly instance of a BCO editor and can be maintained and shared through versioned, stable IDs stored under a single domain of that knowledgebase. BCOs not only provides complete transparency to its data submitters (authors, curators, other databases, etc.), collaborators, and users, but also provide an efficient mechanism to reproduce the complete workflow through the information stored in different domains (such as description
, execution
, io
, error
, etc.) in machine and human-readable formats.
The most common way of adapting BCOs for use in knowledgebases is by leveraging the Extension Domain. In this example, the Extension Domain is used for calling fields based on column headers. Note that the Extension Domain identifies its own schema, which defines the column headers and identifies them as required where appropriate. Because the JSON format of a BCO is human and machine readable (and can be further adapted for any manner of display or editing through a user interface), BCOs are amendable to either manual or automatic curation processes, such as the curation process that populates those fields in the above example.
reviewed. approved.
@HadleyKing we're set to push this.
Need to add to the FAQ pages for now
Include text about how knowledgebase can use BioCompute in the documentation.