SynBioDex / SBOL-specification

The Synthetic Biology Open Language (SBOL)
http://sbolstandard.org
15 stars 9 forks source link

Explicitly specify whether collections are expected to be ordered. #30

Closed mikebissell closed 8 years ago

mikebissell commented 9 years ago

The specification of Collection does not explicitly say whether a Collection is an ordered set or an unordered set. There is room for interpretation because a Collection's serialization, being serial, has a logical order. The question is, must an SBOL compliant tool preserve the order of an incoming Collection when that set is deserialized? Must it manage the Collection as an ordered set? Must it regenerate the Collection in the same order upon deserialization? If so, this requirement must be stated. If not, we should definitely say so.

For discussion: At IWBDA, in a conversation between Mike (Amyris), Barbara (Zymergen), and Swapnil, we noted how those of us who code automated systems for managing physical samples almost always deal with those samples as (spatially mapped) ordered sets because we're running stacks of microtiter plates. SBOL itself doesn't give us a way to encode the ordered 2D and 3D arrays of components that are our bread and butter, and of course it doesn't give us a way to encode standard spatial mappings (e.g. plate geometries), even though this data is normally required before we can generate the robot instructions for assembling our components. If we're going to build interoperable systems, somebody should standardize this layer of the interface.

An unordered Collection is often all we need when designing parts and strains, but it's not terribly useful when it comes time to implement a fab.

eoberortner commented 9 years ago

That’s a very interesting topic!

in general, the order of the samples on a plate determines the order of the elements in a collection. also, samples can contain multiple clones. And if needed, then plates can be stacked.

one quick fix of the 2D request’’ could be to introduce a notation ofPlate’’ into SBOL, either in core or in a specific extension (could be up for discussion) and Plate’’ could be a sub-type/extension of Collection. not sure how to solve the3D request’’ yet.

anyway, we at the JGI would welcome to specify the order of collection elements from the perspective of plates too!

what does the community think? are there more request for plate-ordered collections?

Thanks, Ernst

On Aug 21, 2015, at 10:08 AM, mikebissell notifications@github.com wrote:

The specification of Collection does not explicitly say whether a Collection is an ordered set or an unordered set. There is room for interpretation because a Collection's serialization, being serial, has a logical order. The question is, must an SBOL compliant tool preserve the order of an incoming Collection when that set is deserialized? Must it manage the Collection as an ordered set? Must it regenerate the Collection in the same order upon deserialization? If so, this requirement must be stated. If not, we should definitely say so.

For discussion: At IWBDA, in a conversation between Mike (Amyris), Barbara (Zymergen), and Swapnil, we noted how those of us who code automated systems for managing physical samples almost always deal with those samples as (spatially mapped) ordered sets because we're running stacks of microtiter plates. SBOL itself doesn't give us a way to encode the ordered 2D and 3D arrays of components that are our bread and butter, and of course it doesn't give us a way to encode standard spatial mappings (e.g. plate geometries), even though this data is normally required before we can generate the robot instructions for assembling our components. If we're going to build interoperable systems, somebody should standardize this layer of the interface.

An unordered Collection is often all we need when designing parts and strains, but it's not terribly useful when it comes time to implement a fab.

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/30.

cjmyers commented 9 years ago

Collections are definitely unordered. Indeed, nothing should ever be inferred from the order of things in the serialization. Order is not guaranteed to be preserved.

The best way to get an order now is to add a custom annotation that assigns an ordering number to members of a collection. As with any annotation, if it proves useful to many users, then we can migrate into the standard itself.

Chris

Sent from my iPhone

On Aug 21, 2015, at 10:35 AM, eoberortner notifications@github.com wrote:

That’s a very interesting topic!

in general, the order of the samples on a plate determines the order of the elements in a collection. also, samples can contain multiple clones. And if needed, then plates can be stacked.

one quick fix of the 2D request’’ could be to introduce a notation ofPlate’’ into SBOL, either in core or in a specific extension (could be up for discussion) and Plate’’ could be a sub-type/extension of Collection. not sure how to solve the3D request’’ yet.

anyway, we at the JGI would welcome to specify the order of collection elements from the perspective of plates too!

what does the community think? are there more request for plate-ordered collections?

Thanks, Ernst

On Aug 21, 2015, at 10:08 AM, mikebissell notifications@github.com wrote:

The specification of Collection does not explicitly say whether a Collection is an ordered set or an unordered set. There is room for interpretation because a Collection's serialization, being serial, has a logical order. The question is, must an SBOL compliant tool preserve the order of an incoming Collection when that set is deserialized? Must it manage the Collection as an ordered set? Must it regenerate the Collection in the same order upon deserialization? If so, this requirement must be stated. If not, we should definitely say so.

For discussion: At IWBDA, in a conversation between Mike (Amyris), Barbara (Zymergen), and Swapnil, we noted how those of us who code automated systems for managing physical samples almost always deal with those samples as (spatially mapped) ordered sets because we're running stacks of microtiter plates. SBOL itself doesn't give us a way to encode the ordered 2D and 3D arrays of components that are our bread and butter, and of course it doesn't give us a way to encode standard spatial mappings (e.g. plate geometries), even though this data is normally required before we can generate the robot instructions for assembling our components. If we're going to build interoperable systems, somebody should standardize this layer of the interface.

An unordered Collection is often all we need when designing parts and strains, but it's not terribly useful when it comes time to implement a fab.

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/30.

— Reply to this email directly or view it on GitHub.

eoberortner commented 9 years ago

good idea to annotate the components of a collection with, for example, their location on a plate.

Best, Ernst

On Aug 21, 2015, at 10:45 AM, cjmyers notifications@github.com wrote:

Collections are definitely unordered. Indeed, nothing should ever be inferred from the order of things in the serialization. Order is not guaranteed to be preserved.

The best way to get an order now is to add a custom annotation that assigns an ordering number to members of a collection. As with any annotation, if it proves useful to many users, then we can migrate into the standard itself.

Chris

Sent from my iPhone

On Aug 21, 2015, at 10:35 AM, eoberortner notifications@github.com wrote:

That’s a very interesting topic!

in general, the order of the samples on a plate determines the order of the elements in a collection. also, samples can contain multiple clones. And if needed, then plates can be stacked.

one quick fix of the 2D request’’ could be to introduce a notation ofPlate’’ into SBOL, either in core or in a specific extension (could be up for discussion) and Plate’’ could be a sub-type/extension of Collection. not sure how to solve the3D request’’ yet.

anyway, we at the JGI would welcome to specify the order of collection elements from the perspective of plates too!

what does the community think? are there more request for plate-ordered collections?

Thanks, Ernst

On Aug 21, 2015, at 10:08 AM, mikebissell notifications@github.com wrote:

The specification of Collection does not explicitly say whether a Collection is an ordered set or an unordered set. There is room for interpretation because a Collection's serialization, being serial, has a logical order. The question is, must an SBOL compliant tool preserve the order of an incoming Collection when that set is deserialized? Must it manage the Collection as an ordered set? Must it regenerate the Collection in the same order upon deserialization? If so, this requirement must be stated. If not, we should definitely say so.

For discussion: At IWBDA, in a conversation between Mike (Amyris), Barbara (Zymergen), and Swapnil, we noted how those of us who code automated systems for managing physical samples almost always deal with those samples as (spatially mapped) ordered sets because we're running stacks of microtiter plates. SBOL itself doesn't give us a way to encode the ordered 2D and 3D arrays of components that are our bread and butter, and of course it doesn't give us a way to encode standard spatial mappings (e.g. plate geometries), even though this data is normally required before we can generate the robot instructions for assembling our components. If we're going to build interoperable systems, somebody should standardize this layer of the interface.

An unordered Collection is often all we need when designing parts and strains, but it's not terribly useful when it comes time to implement a fab.

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/30.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/30#issuecomment-133508899.

mikebissell commented 9 years ago

Regarding this ticket's proposal:

I definitely don't want to see any "quick fixes" here, except for adding some explanatory text to the spec.

At the very least we should explicitly restate the clarification from Chris: collections are unordered (non-order-preserving) sets.

We might also take the opportunity to emphasize that they are sets, i.e. every element is distinct (with respect to the unique id in its about= URI), and that means that Collection is not the correct data type for storing plates, which commonly contain many replicates of the same Component. (Even adding a custom index# annotation to Component can't turn this class into an Array that can store multiple replicates of the same design.)

Continuing the discussion about the feature ideas that surfaced above (not part of this issue):

As for describing physical arrays of samples, we could accomplish that using either a sidecar schema or via a new layer in the main standard document. Internally, we all need to describe arrays of samples in order to compile protocols. Robots, for example, work on physically organized samples. Externally, any two peer organizations in a supply chain likewise need to be able to communicate about physically organized samples as well (stacks of barcoded 96w plates fulfilling an oligo order, for example).

Think about how things stand now. They're messy. How many of us have had to submit IDT primer orders using tabbed Excel sheets, where each tab's label stores the crucial barcode? ...As if humans should be generating and consuming these things? How many of us have had to write specialized parsers for consuming a particular supplier's special platemaps?

In order to facilitate the efficient exchange of platemaps across modular service boundaries, there should be a simple, standard data model and serialization format. Since there's plenty of variation within the industry, the format should be flexible enough to support multiple container geometries, multiple coordinate spaces, tightly ordered collections (96w plate), loose collections (racks of vials, stacks of plates), automatic identifiers (barcodes), and recursive collections... without being so complex that suppliers will just ignore it. (I have noticed that certain popular suppliers do not budget for software engineering, so whatever we propose needs to cost little more to implement than a CSV dump.)

Obviously this is not a computational problem. It's just a matter of building consensus, writing docs and examples, and driving adoption. If we successfully promote a developer-friendly, streamlined format, then someday we'llall no longer have to waste time generating, transmitting, receiving, parsing, and interpreting custom Excel sheets and CSVs every time we communicate with 3rd parties about supply chain transactions. That'll make it cheaper to build the kind of robust, automated, high volume B2B connections we'll require as our operations scale up.

Ideally, one file would contain both the SBOL-structural descriptions and the physical organization descriptions.

Is there already something out there? Or have we all just rolled our own proprietary solutions to this simple problem?

jakebeal commented 8 years ago

Duplicate of #37