SED-ML / sed-ml

Simulation Experiment Description Markup Language (SED-ML)
http://sed-ml.org
5 stars 2 forks source link

Handle complex resources like archive/nested files and multiple assets per file (input & output of resources) #46

Open matthiaskoenig opened 7 years ago

matthiaskoenig commented 7 years ago

Issue

We need to be able to reference files/content/resources in complex/nested files and single assets in files which contain multiple assets. These could be CombineArchive Entries, FileContent in a HDF5 file, or in general compressed or packaged formats like tar, gz, ..... This also affects NuML files which contain multiple assets. This is a recursive problem, because one could want to reference a resultComponent in a NuML file which is within a CombineArchive.

We need such a Resource

Examples

An issue related to this is: Additional things are important for sources

The encryption is important for tools to understand that the source is password protected. Also in general to retrieve online resources in a secure matter this should be encrypted during transport. For instance I have combine archives with sensitive patient data and want to reference data within these. I would not make these files directly available but package them in password protected archives and have to ensure encryption during transport of the files. Same for sensitive models, e.g., individualized models which could leak patient information.

Proposals

Add additional class and list Resource and listOfResources to describe complex file resources.

Extend the source attribute and allow either anyURI like now or a SIdRef to a new Source class which would allow to cover such complex cases. Everything is backwards compatible, but we could cover many more use cases.

Resource(SED-Base):
    id: SId
    location: string
    resource: anyURI | resource SIdRef (location within the source, like combine archive, zip, HD5)
    format: 
    md5: string {use: optional}
    compression {use: optional}: string (zip, gz, tar, ...) 
    encryption {use: optional}: string (no passwords here, just a indicator that the file is protected)

and

Model(SED-Base):
    ...
    source: anyURI | resource SIDRef

DataDescription(SED-Base):
    ...
    source: anyURI | resource SIDRef

See also https://github.com/SED-ML/sed-ml/issues/51

https://docs.google.com/drawings/d/1uM1AqG_MLVRcOw2DuP4HguL2B7iF-quFRa0eaFvtni4/edit

matthiaskoenig commented 7 years ago

This just became a L1V3 issue and needs a solution!

Numl is already such a complex DataType and can contain multiple <resultComponent> Definitions. One of the possible many <resultComponent>s is hereby the data which is referenced in SED-ML, i.e. it is not sufficient to give only a source to specify which NuML data. With the current spec there is no way to define which <resultComponent> of the NuML file is meant.

By extending the source like proposed above this could be solved. The location would be hereby the NuMLId of the resultComponent which is meant.

Mentioning to get attention: @luciansmith @nickerso @dagwa @bgoli

matthiaskoenig commented 7 years ago

Here a NuML example containing concentrations and fluxes. Must be possible to reference the individual parts.

<?xml version="1.0" encoding="UTF-8"?>
<numl xmlns="http://www.numl.org/numl/level1/version1" level="1" version="1">   
<ontologyTerms>
    <ontologyTerm id="term1" term="Enhanced Newton"
        sourceTermId="SBRML:00003" ontologyURI="urn:sbrml:ontologyterms" />
    <ontologyTerm id="term2" term="Steady State" sourceTermId="TEDDY_0000011"
        ontologyURI="http://teddyontology.sourceforge.net/teddy/rel-2007-09-03/ontology/teddy.owl" />
    <ontologyTerm id="term3" term="concentration"
        sourceTermId="SBO:0000196" ontologyURI="http://www.ebi.ac.uk/sbo/" />   
    <ontologyTerm id="term5" term="flux" sourceTermId="C2348693" ontologyURI="http://www.nlm.nih.gov/research/umls" />  
    </ontologyTerms>
    <resultComponents>
    <resultComponent id="component1">
        <dimensionDescription>
            <compositeDescription id="Species" name="Species" indexType="string">
                <tupleDescription>
                    <atomicDescription id="Concentration" name="Concentration" ontologyTerm="term3" valueType="double" />
                </tupleDescription>
            </compositeDescription>
        </dimensionDescription>
        <dimension>
            <compositeValue indexValue="s_glu">             
                    <atomicValue>0.0094541</atomicValue>                    
            </compositeValue>
            <compositeValue indexValue="s_pyr">             
                    <atomicValue>2.34994e-05</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="s_acetate">             
                    <atomicValue>6.41826e-13</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="s_acetald">             
                    <atomicValue>4.70649e-15</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="s_EtOH">                
                    <atomicValue>3.57624e-13</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="x">             
                    <atomicValue>7.43762</atomicValue>                  
            </compositeValue>
        </dimension>
    </resultComponent>
    <resultComponent id="component2">
        <dimensionDescription>
            <compositeDescription id="Reaction" name="Reaction" indexType="string">
                    <atomicDescription id="Flux" name="Flux" ontologyTerm="term5" valueType="double" />
            </compositeDescription>
        </dimensionDescription>
        <dimension>
            <compositeValue indexValue="r1">                
                    <atomicValue>0.482986</atomicValue>             
            </compositeValue>
            <compositeValue indexValue="r2">                
                    <atomicValue>0.472358</atomicValue>                 
            </compositeValue>
            <compositeValue indexValue="r3">
                <tuple>
                    <atomicValue>6.19024e-12</atomicValue>
                    <atomicValue>3.72785e+12</atomicValue>
                </tuple>
            </compositeValue>
            <compositeValue indexValue="r4">                
                    <atomicValue>3.06043e-12</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="r5">                
                    <atomicValue>2.82728e-12</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="r6">            
                    <atomicValue>3.42224e-14</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="r7">            
                    <atomicValue>1.01607</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="r8">            
                    <atomicValue>1.2799e-12</atomicValue>                   
            </compositeValue>
            <compositeValue indexValue="r9">                
                    <atomicValue>0.0182721</atomicValue>                    
            </compositeValue>
            <compositeValue indexValue="r10">               
                    <atomicValue>0.550797</atomicValue>                 
            </compositeValue>
            <compositeValue indexValue="r11">               
                    <atomicValue>0.00304534</atomicValue>                   
            </compositeValue>
            <compositeValue indexValue="s_glu_in">              
                    <atomicValue>1.5</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="s_glu_out">         
                    <atomicValue>0.00094541</atomicValue>               
            </compositeValue>
            <compositeValue indexValue="s_pyr_out">             
                    <atomicValue>2.34994e-06</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="s_acetate_out">             
                    <atomicValue>6.41826e-14</atomicValue>              
            </compositeValue>
            <compositeValue indexValue="s_acetald_out">             
                    <atomicValue>4.70649e-16</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="s_EtOH_out">            
                    <atomicValue>3.57624e-14</atomicValue>                  
            </compositeValue>
            <compositeValue indexValue="a_out">             
                    <atomicValue>0.174693</atomicValue>                 
            </compositeValue>
            <compositeValue indexValue="x_out">             
                    <atomicValue>0.743762</atomicValue>                 
            </compositeValue>
            <compositeValue indexValue="AcDH_out">              
                    <atomicValue>0.0152267</atomicValue>                    
            </compositeValue>
        </dimension>
    </resultComponent>
    </resultComponents>
</numl>
matthiaskoenig commented 7 years ago

@fbergmann Could you comment on this?

I currently see two solutions for L1V3 to handle the multiple NuML results

  1. extend the source attribute like proposed above. This would not hard to implement and provide a lot of additional functionality.
  2. state for L1V3: In case of NuML files with multiple ResultComponents the first ResultComponent is used. L1V4 will than include the extension of the source attribute.

Somehow leaning now to option 2. Than L1V3 would be finished.

fbergmann commented 7 years ago

@matthiaskoenig I'm happy going with option 2 for now.

matthiaskoenig commented 7 years ago

Option 2 is now part of the spec. Added sentences to clarify. The more complex cases have to be solved in L1V4.