MetabolicAtlas / standard-GEM

The standard for open-source GEMs on GitHub
https://www.biorxiv.org/content/10.1101/2023.03.21.512712
Creative Commons Attribution 4.0 International
18 stars 5 forks source link

Encourage the use of COMBINE archives as exchange format for the model and its execution #18

Open draeger opened 4 years ago

draeger commented 4 years ago

Description of the issue:

The SBML is a declarative file format that specifies model components, structure, and interaction of those components. But it does not directly specify how to run that model or how to directly reproduce the figures in a scientific paper from the model. Depending on which solver is used to run a model or in which framework a model is interpreted, the results may diverge.

By using the additional format SED-ML (Simulation Experiment Description Markup Language), it becomes possible to specify how to interpret and run a model, including the typical steps in a simulation life cycle.

To make the use of two separate files less cumbersome for the user, the COMBINE archive format allows wrapping both in a ZIP-based archive together with a manifest file that specifies the relationship between model and SED-ML script. Further data can be added to that archive, e.g., annotation glossaries, original publications, image files with pathways, or SBGNML files for defining pathway maps.

Expected feature/value/output:

Instead of SBML, the exchange format of COBRA tools would become a COMBINE archive file (typically with extension OMEX). It would contain the SBML file with the model, possibly annotations in a separate file, a SED-ML file that specifies how to execute the model, and perhaps more.

Current feature/value/output:

The steps to run the model would be encoded in the SED-ML file allowing third-party software to execute the same steps, hence improving the interoperability of various software and reproducibility of the results.

Reproducing these results:

There are implementations available in Python and other languages to access content within COMBINE archives and to read/write the manifest file.

mihai-sysbio commented 4 years ago

Interesting idea @draeger.

As a concept, a COMBINE archive is a great step forward to solve problems in modelling. However, being a ZIP limits what it can achieve when compared to versioning (git) and infrastructure (GitHub). I see some advantages if there would be a way to combine (no pun intended) the two approaches.

For situations like these, I default to the 6 thinking hats method. It's easier in person, but in my experience it works well in writing too.

White hat - facts:

Red hat - emotion:

Black hat - judgement:

Contributions are need; it would be great if you could label ideas with a hat color, too.

Midnighter commented 4 years ago

You can create additional artefacts that can become part of a release. I could envision each release (tag that is also on Zenodo then) to provide the following separately:

mihai-sysbio commented 4 years ago

Green hat - possibilities (building on what @Midnighter described above):

mihai-sysbio commented 3 years ago

Looking at the contents of the COMBINE archive (section 3.3), Table 1 in the showcase and the example repository, the archive consists of:

  1. manifest.xml This file contains essentially a listing of the file tree with the file formats. standard-GEM imposes a requirement regarding the main directories, extensions and some file names. Adopting a similar manifest in standard-GEM would be redundant.
  2. authorship information In any git-based versioning system, this information is provided by author or committer, and is deeply embedded on platforms such as GitHub. Moreover, as models are curated over time, a list of authors/contributors would not be rich enough to be linked to actual contributions (commits).
  3. fixed file tree There is some overlap here, and we should aim to increase the compatibility if possible. The directories specified by COMBINE are: 3.1. documentation/ : files that describe and document the model and/or experiment In standard-GEM, documentation is provided more closely with the element it documents, ie within data/ and code/ folders. 3.2 model/ : files that encode and visualise the biological system Essentially the same approach here. 3.3 experiment/ : files that encode the in silico setup of the experiment 3.4 result/ : files that result from running the experiment

Like mentioned in the previous post, I think something should be done regarding 3.3 and 3.4. @yahanma has taken a similar approach by creating an analysis/ directory over at vna-GEM.

Also a follow-up on the idea of automatically creating COMBINE archives, it feels like work in this direction is already started through CombineArchiveWeb, where instead of uploading file by file, one could point directly to a repository that follows standard-GEM.

mihai-sysbio commented 3 years ago

Following up on the CombineArchiveWeb idea, it looks like it is possible to create archives from a Git repository:

image image


Here is what I think would need to be done in order to close the issue:

@draeger what else would you recommend so this issue can be resolved? Are there any thoughts from the watchers of this issue?

draeger commented 3 years ago

I think, this is very nice. Have you tried it out? Possibly, a build script could also wrap a bunch of files in an archive and write the manifest file during a local execution. But a webservice can certainly do the same (note: it will require data transmission).