SynBioDex / SBOL-specification

The Synthetic Biology Open Language (SBOL)
http://sbolstandard.org
14 stars 9 forks source link

Need a rule to disallow annotations in SBOL namespace #41

Closed cjmyers closed 8 years ago

cjmyers commented 8 years ago

Annotation tags should never be in the SBOL namespace. May need a rule that says this explicitly.

cjmyers commented 8 years ago

Same goes for generic top levels.

bbartley commented 8 years ago

This may not be simple or maybe not even enforceable in the C++ library at a fundamental level. The C++ library was designed with extensibility as a priority. It has a more RDF-centric approach than the Java library. Any class derived from an SBOL core class is treated like an extension element and there is no way to put a restriction on it. Furthermore, the parser automatically ignores any RDF/XML element in the SBOL namespace that is not explicitly defined in either the user's core or extension packages. So if a user is lacking a package and encounters elements in the SBOL namespace it doesn't recognize, the parser will simply ignore them. These features may have important implications for extension development. One of these implications is that people can develop extensions without any official blessing from the central SBOL authority. This is an issue that many in our community will have a philosophical opinion about, so it should be discussed through the SEP process.

cjmyers commented 8 years ago

The reason that this rule is needed is to ease development of support for SBOL. The behavior you are describing is very problematic, and we had countless problems before we put this rule in because we did something similar. In our case, anything in the SBOL namespace not recognized was turned into an annotation. This meant when we had typos in the sbol tags, they were turned into annotations and we had no idea the typos were there. This made it extremely difficult to find bugs. Simply dropping content is even worse, as it means if there is a typo and you read a file then write it out, it would silently drop content! Wouldn’t it be better to throw an exception to indicate that content is being dropped, so the user could go and figure out why?

If you want your library to support extensions, this is not the way to do it. If you put extensions into the SBOL namespace and then the library drops that content, then anyone using your library cannot experiment with extensions. They would need to modify your library. This is not good for development. In the approach, where extensions are put into different namespaces, these extensions will come through as annotations/genericTopLevels, and any developer using your library will be able to experiment with extensions with zero changes to your library. They can precisely do what you say which is experiment with extensions with ZERO blessing from any central SBOL authority. Once, they convince the community that the extension is good and should be core, then and only then will the library need modification to support the new elements to be migrated into the SBOL namespace.

I argue that allowing annotations in the SBOL namespace would require an SEP, not the reverse. All examples in the specification of objects that are not defined are put into their own namespace and not the SBOL namespace. There is nowhere that states that it is permissible to do this. The namespace is the way you know what authority has defined a tag, so you can go lookup what that tag means. If you allow tags that are not defined in the SBOL specification to be in the SBOL namespace, this would be very confusing as where do you find their definition? Putting extensions into its own namespace, would allow you to indicate who is responsible for these new tags. It would give a clue as to where to look for their definition.

If we write and vote on an SEP for this, I would argue that what we are saying is that SBOL 2.0 allowed undefined tags in the SBOL namespace. In that case, libraries will need to support this in 2.0 files. I believe the editors should decide if the specification allows this or not. If they decide the specification allows this, we need an SEP to disallow it, in my opinion, but we also need to make sure all libraries allow it too (and not by silently dropping but by putting into annotations). If the editors decide, the specification does not allow it, then someone else could write an SEP to allow it. Before ANY SEP is written, we first have to know what the specification says.

On Feb 26, 2016, at 11:03 AM, bbartley notifications@github.com wrote:

This may not be simple or maybe not even enforceable in the C++ library at a fundamental level. The C++ library was designed with extensibility as a priority. It has a more RDF-centric approach than the Java library. Any class derived from an SBOL core class is treated like an extension element and there is no way to put a restriction on it. Furthermore, the parser automatically ignores any RDF/XML element in the SBOL namespace that is not explicitly defined in either the user's core or extension packages. So if a user is lacking a package and encounters elements in the SBOL namespace it doesn't recognize, the parser will simply ignore them. These features may have important implications for extension development. One of these implications is that people can develop extensions without any official blessing from the central SBOL authority. This is an issue that many in our community will have a philosophical opinion about, so it should be discussed through the SEP p rocess.< /p>

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-189399873.

bbartley commented 8 years ago

Hi Chris

If you want your library to support extensions, this is not the way to do it. If you put extensions into the SBOL namespace and then the library drops that content, then anyone using your library cannot experiment with extensions. They would need to modify your library. This is not good for development.

No. This is exactly the opposite of the truth and is not an accurate description of how the C++ library works. One can define an extension to the C++ library without modifying the core. The parser and the serializer are completely agnostic about namespaces and tags it does not recognize. So if an extension developer chooses to serialize new properties into the SBOL namespace, and I don't have that extension installed, then my library will ignore these elements when parsing. If I then go ahead and install the extension, then my library will now recognize extension tags, without any need to modify the core. It is maximally open world for extension developers and requires no restriction on namespaces. I'd argue that this is closer to what RDF enthusiasts mean by "extensibility"

You raise a couple of reasonable points, but I think they might best be discussed through an SEP. However, let's see what the other editors think...

cjmyers commented 8 years ago

Maybe I’m misunderstanding by “ignore these elements when parsing”. My interpretation is this means those elements are dropped during parsing. In other words, assume you read a file with elements in the SBOL namespace that you do not recognize and then immediately write it out again. Would they be dropped? If so, this is a big problem, since you would potentially be dropping out content important to someone else. A read/write operation should not modify the contents of an SBOL 2.0 file other than perhaps the order of elements.

In libSBOLj, we could also ignore sbol tags outside of the ones we recognize but this would cause the problem I just mentioned. If instead, we parse tags we don’t recognize as annotations. Then, we could potentially miss bugs in tags from other developers. Consider the case where someone writes their own SBOL creation code, and they accidentally use the tag rather than . If this is not marked as a validation error, we would parse this in as an annotation, and the user of say our validator would never know about their typo. Indeed, we experienced a lot of these types of errors early on until we added this validation check. Namely, we accidentally had tag mismatch on read and write, and we ended up with things being mis-parsed.

To experiment with my proposed process for SEP generation, I would like to see three editors concur that all tags in the SBOL namespace must be parsed without any errors before creating an SEP. So, if the rest of the editors can comment on their opinions, we can move forward on this.

Chris

On Mar 2, 2016, at 6:45 PM, bbartley notifications@github.com wrote:

Hi Chris

If you want your library to support extensions, this is not the way to do it. If you put extensions into the SBOL namespace and then the library drops that content, then anyone using your library cannot experiment with extensions. They would need to modify your library. This is not good for development.

No. This is exactly the opposite of the truth and is not an accurate description of how the C++ library works. One can define an extension to the C++ library without modifying the core. The parser and the serializer are completely agnostic about namespaces and tags it does not recognize. So if an extension developer chooses to serialize new properties into the SBOL namespace, and I don't have that extension installed, then my library will ignore these elements when parsing. If I then go ahead and install the extension, then my library will now recognize extension tags, without any need to modify the core. It is maximally open world for extension developers and requires no restriction on namespaces. I'd argue that this is closer to what RDF enthusiasts mean by "extensibility"

You raise a couple of reasonable points, but I think they might best be discussed through an SEP. However, let's see what the other editors think...

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191534525.

drdozer commented 8 years ago

This is how my scala library works also. If a predicate is not bound to a class property, it is treated as annotation. Regardless of namespace. The library client can supply different classes with different bound predicates and properties at compile time, and that is what they will get given. On 3 Mar 2016 01:45, "bbartley" notifications@github.com wrote:

Hi Chris

If you want your library to support extensions, this is not the way to do it. If you put extensions into the SBOL namespace and then the library drops that content, then anyone using your library cannot experiment with extensions. They would need to modify your library. This is not good for development.

No. This is exactly the opposite of the truth and is not an accurate description of how the C++ library works. One can define an extension to the C++ library without modifying the core. The parser and the serializer are completely agnostic about namespaces and tags it does not recognize. So if an extension developer chooses to serialize new properties into the SBOL namespace, and I don't have that extension installed, then my library will ignore these elements when parsing. If I then go ahead and install the extension, then my library will now recognize extension tags, without any need to modify the core. It is maximally open world for extension developers and requires no restriction on namespaces. I'd argue that this is closer to what RDF enthusiasts mean by "extensibility"

You raise a couple of reasonable points, but I think they might best be discussed through an SEP. However, let's see what the other editors think...

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191534525 .

cjmyers commented 8 years ago

Ok, I guess a compromise would be to have the validator have a mode which is “strict” and gives errors when tags are not recognized from the SBOL namespace. This would mean adding best practice validation rules that say you should not include tags outside the defined tags for sbol. This would allow both validation to catch silly typos in tags, as well as the ability to put things in the sbol namespace before official adoption into the core standard. Would this satisfy everyone?

Since these are best practices and NOT data model changes. I think this could go in SBOL 2.0.1 without needing a formal SEP process.

On Mar 3, 2016, at 3:48 AM, Matthew Pocock notifications@github.com wrote:

This is how my scala library works also. If a predicate is not bound to a class property, it is treated as annotation. Regardless of namespace. The library client can supply different classes with different bound predicates and properties at compile time, and that is what they will get given. On 3 Mar 2016 01:45, "bbartley" notifications@github.com wrote:

Hi Chris

If you want your library to support extensions, this is not the way to do it. If you put extensions into the SBOL namespace and then the library drops that content, then anyone using your library cannot experiment with extensions. They would need to modify your library. This is not good for development.

No. This is exactly the opposite of the truth and is not an accurate description of how the C++ library works. One can define an extension to the C++ library without modifying the core. The parser and the serializer are completely agnostic about namespaces and tags it does not recognize. So if an extension developer chooses to serialize new properties into the SBOL namespace, and I don't have that extension installed, then my library will ignore these elements when parsing. If I then go ahead and install the extension, then my library will now recognize extension tags, without any need to modify the core. It is maximally open world for extension developers and requires no restriction on namespaces. I'd argue that this is closer to what RDF enthusiasts mean by "extensibility"

You raise a couple of reasonable points, but I think they might best be discussed through an SEP. However, let's see what the other editors think...

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191534525 .

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191703614.

drdozer commented 8 years ago

I think there are 2 issues here. There's a politeness one. It is very rude for someone to create predicates in the SBOL namespace without going through the SBOL development channels, and our tooling should certainly not be expected to do anything sensible with these beyond not dropping them on the floor. Then there's a maintenance one. The spec will change over time, and during development of the spec, there will be a need for documents with various variations on the terminology supported, to test out what works well and what doesn't. In those situations it is better if tooling gracefully handles things they don't fully understand or expect. Otherwise we have to develop the library fully before we can test a data model change, which makes the cost prohibitively high for testing out multiple alternatives.

On 3 March 2016 at 14:00, cjmyers notifications@github.com wrote:

Ok, I guess a compromise would be to have the validator have a mode which is “strict” and gives errors when tags are not recognized from the SBOL namespace. This would mean adding best practice validation rules that say you should not include tags outside the defined tags for sbol. This would allow both validation to catch silly typos in tags, as well as the ability to put things in the sbol namespace before official adoption into the core standard. Would this satisfy everyone?

Since these are best practices and NOT data model changes. I think this could go in SBOL 2.0.1 without needing a formal SEP process.

On Mar 3, 2016, at 3:48 AM, Matthew Pocock notifications@github.com wrote:

This is how my scala library works also. If a predicate is not bound to a class property, it is treated as annotation. Regardless of namespace. The library client can supply different classes with different bound predicates and properties at compile time, and that is what they will get given. On 3 Mar 2016 01:45, "bbartley" notifications@github.com wrote:

Hi Chris

If you want your library to support extensions, this is not the way to do it. If you put extensions into the SBOL namespace and then the library drops that content, then anyone using your library cannot experiment with extensions. They would need to modify your library. This is not good for development.

No. This is exactly the opposite of the truth and is not an accurate description of how the C++ library works. One can define an extension to the C++ library without modifying the core. The parser and the serializer are completely agnostic about namespaces and tags it does not recognize. So if an extension developer chooses to serialize new properties into the SBOL namespace, and I don't have that extension installed, then my library will ignore these elements when parsing. If I then go ahead and install the extension, then my library will now recognize extension tags, without any need to modify the core. It is maximally open world for extension developers and requires no restriction on namespaces. I'd argue that this is closer to what RDF enthusiasts mean by "extensibility"

You raise a couple of reasonable points, but I think they might best be discussed through an SEP. However, let's see what the other editors think...

— Reply to this email directly or view it on GitHub < https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191534525

.

— Reply to this email directly or view it on GitHub < https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191703614 .

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191772249 .

Dr Matthew Pocock Turing ate my hamster LTD mailto: turingatemyhamster@gmail.com

Integrative Bioinformatics Group, School of Computing Science, Newcastle University mailto: matthew.pocock@ncl.ac.uk

gchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozer skype: matthew.pocock tel: (0191) 2566550 mob: +447535664143

cjmyers commented 8 years ago

Ok, this is where I actually get confused. It seems to me the best balance of the 2 issues you raise is:

1) Disallow invalid tags in the SBOL namespace. 2) Use a new namespace for new tags for experimental enhancements to the SBOL data standard.

Both of these are easy to support, and this is what libSBOLj does now. It is very easy to experiment with new data content. The annotation scheme in SBOL is very powerful, since you can create ANY content you want either as new GenericTopLevel objects OR custom annotations on existing TopLevel objects. For example, let us consider the current proposal of adding a “role” field to component. If one wants to try this now with libSBOLj, one simply does the following:

1) Adds a new namespace to the document, let’s call it “sbolExp=http://experimental.sbols.org http://experimental.sbols.org/”.
2) One can create an annotation of component with the tag sbolExp:role.

In fact, we could even create a new “standard” experimental namespace for proposed SBOL content which is completely open while being strict within the current sbol namespace.

This not only stops people from being “rude” and using our namespace without going through proper channels, but it also allows for completely open experimentation with new ideas for content.

On Mar 3, 2016, at 7:19 AM, Matthew Pocock notifications@github.com wrote:

I think there are 2 issues here. There's a politeness one. It is very rude for someone to create predicates in the SBOL namespace without going through the SBOL development channels, and our tooling should certainly not be expected to do anything sensible with these beyond not dropping them on the floor. Then there's a maintenance one. The spec will change over time, and during development of the spec, there will be a need for documents with various variations on the terminology supported, to test out what works well and what doesn't. In those situations it is better if tooling gracefully handles things they don't fully understand or expect. Otherwise we have to develop the library fully before we can test a data model change, which makes the cost prohibitively high for testing out multiple alternatives.

On 3 March 2016 at 14:00, cjmyers notifications@github.com wrote:

Ok, I guess a compromise would be to have the validator have a mode which is “strict” and gives errors when tags are not recognized from the SBOL namespace. This would mean adding best practice validation rules that say you should not include tags outside the defined tags for sbol. This would allow both validation to catch silly typos in tags, as well as the ability to put things in the sbol namespace before official adoption into the core standard. Would this satisfy everyone?

Since these are best practices and NOT data model changes. I think this could go in SBOL 2.0.1 without needing a formal SEP process.

On Mar 3, 2016, at 3:48 AM, Matthew Pocock notifications@github.com wrote:

This is how my scala library works also. If a predicate is not bound to a class property, it is treated as annotation. Regardless of namespace. The library client can supply different classes with different bound predicates and properties at compile time, and that is what they will get given. On 3 Mar 2016 01:45, "bbartley" notifications@github.com wrote:

Hi Chris

If you want your library to support extensions, this is not the way to do it. If you put extensions into the SBOL namespace and then the library drops that content, then anyone using your library cannot experiment with extensions. They would need to modify your library. This is not good for development.

No. This is exactly the opposite of the truth and is not an accurate description of how the C++ library works. One can define an extension to the C++ library without modifying the core. The parser and the serializer are completely agnostic about namespaces and tags it does not recognize. So if an extension developer chooses to serialize new properties into the SBOL namespace, and I don't have that extension installed, then my library will ignore these elements when parsing. If I then go ahead and install the extension, then my library will now recognize extension tags, without any need to modify the core. It is maximally open world for extension developers and requires no restriction on namespaces. I'd argue that this is closer to what RDF enthusiasts mean by "extensibility"

You raise a couple of reasonable points, but I think they might best be discussed through an SEP. However, let's see what the other editors think...

— Reply to this email directly or view it on GitHub < https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191534525

.

— Reply to this email directly or view it on GitHub < https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191703614 .

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191772249 .

Dr Matthew Pocock Turing ate my hamster LTD mailto: turingatemyhamster@gmail.com

Integrative Bioinformatics Group, School of Computing Science, Newcastle University mailto: matthew.pocock@ncl.ac.uk

gchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozer skype: matthew.pocock tel: (0191) 2566550 mob: +447535664143 — Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/41#issuecomment-191781405.

jakebeal commented 8 years ago

Consensus discussion at the SBOL14 workshop: do your experiments in another namespace. This is 2.0.1 change, which will add a validation rule.

cjmyers commented 8 years ago

This is now handled in modified rules that indicate what fields are allowed for each class.