SynBioDex / pySBOL3

Native python implementation of SBOL 3.0 specification
MIT License
37 stars 16 forks source link

Make implicit list wrapping for singleton initial values universal #301

Closed jakebeal closed 2 years ago

jakebeal commented 3 years ago

Some of our constructors allow list-valued attributes to be specified with either a list or a singleton value that is then implicitly wrapped into a list. Others do not.

For example, with the constructor for Component, the types argument has type Union[list[str], str], but with LocalSubComponent, the types argument has type list[str].

Can we make the implicit wrapping universal without it being too much of a pain?

bbartley commented 3 years ago

👍

tcmitchell commented 3 years ago

🤮

I dislike this suggestion with some passion. I dislike the wrapping we already do, and I'd rather not do any wrapping at all. I am absolutely and without question opposed to doing more wrapping.

Why? Because it is counterintuitive to set Component.types to a string and then get back a list:

>>> c = sbol3.Component('http://example.com/issue301/c1', types=sbol3.SBO_DNA)
>>> c.types
['https://identifiers.org/SBO:0000251']

I think it is appropriate to treat these properties as the lists that they are:

>>> c.roles = [sbol3.SO_PROMOTER]
>>> c.roles
['https://identifiers.org/SO:0000167']

Hooray for TypeError:

>>> c.roles = sbol3.SO_PROMOTER
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tmitchel/Projects/sd2/pySBOL3/sbol3/object.py", line 22, in __setattr__
    self.__dict__[name].set(value)
  File "/Users/tmitchel/Projects/sd2/pySBOL3/sbol3/property_base.py", line 177, in set
    raise TypeError(msg)
TypeError: roles requires one or more values packed in an iterable

If the authors of the SBOL 3 specification wanted these to be singletons (i.e. strings), they would have defined these properties to be of type string. They did not do that though. In their infinite wisdom, they defined these to unbounded properties such that they can have more than one value.

jakebeal commented 3 years ago

One way or another, I think that it should be consistent, whether that means universal wrapping or universal not-wrapping.

jakebeal commented 3 years ago

Look at #324 for potential analog in solution.

tcmitchell commented 3 years ago

We talked about pushing the auto-wrapping down into the handling of initial_value in the various property constructors.

Care must be taken with strings because they are iterable and not really a collection for our purposes. There are a limited number of instances of ListProperty so this may be manageable. Typing may be a pain, we'll have to explore how we might make that work well.

tcmitchell commented 3 years ago

@bbartley @jakebeal should the typing of the constructor arguments indicate that they can be passed a singleton, or is it ok for them to declare that they want a list, and accept a singleton if a singleton is passed (in violation of the typing declaration).

For example, the generated_by property is a List of ReferencedObjects. It's new declaration will be:

generated_by: list[Union[Identified, str]] = None,

That indicates that the user can pass a list of items, each of which are either an Identified or a str. A user can still pass a single Identified or a single str and it will be marshaled into a list. But the declaration says it only wants a list.

Does that make sense, and is it ok with you?

jakebeal commented 3 years ago

I think this depends on whether we consider passing a singleton a "MAY" action or a "SHOULD NOT" action.

If it's a "MAY" action, then we'd want to have the typing indicate they can be passed as a singleton, so that one isn't being pushed away from it by warnings in the IDE.

If it's a "SHOULD" action, then we'd want to have the typing declare lists only, so that the user is chivvied away from the disliked behavior.

My personal preference is for "MAY", because there are a number of properties that allow a list, but for which the typical use case is a singleton.

bbartley commented 3 years ago

If I'm reading between the lines, then I think the reason that @tcmitchell is asking is because the typing looks really hideous otherwise:

generated_by: Union[Union[Identified, str], list[Union[Identified, str]] = None

But here's an idea that might make it more palatable. Can we use Python's typing module to define a custom type? Such as:

generated_by: SingletonOrList[Union[Identified, str]] = None
tcmitchell commented 3 years ago

It's not really about the typing, although I agree that it can get ugly. I also agree that there are ways to make it more palatable.

My concern is really the think I objected to [originally(https://github.com/SynBioDex/pySBOL3/issues/301#issuecomment-924141962). If we spread this far and wide I think it will be confusing to users. An example is useful:

>>> import sbol3
>>> sbol3.set_namespace('https://github.com/SynBioDex/pySBOL3/issues/301')
>>> seq = sbol3.Sequence('seq1', elements='GCAT')
>>> comp = sbol3.Component('c1', types=sbol3.SBO_DNA, sequences=seq)
>>> comp.types == sbol3.SBO_DNA
False
>>> comp.types
['https://identifiers.org/SBO:0000251']
>>> comp.sequences == seq
False
>>> comp.sequences
['https://github.com/SynBioDex/pySBOL3/issues/301/seq1']

In the above example, types is set to a value, and sequences is set to a value. Yet when I compare the values afterwards, they do not match. Instead they are of a different type, a list instead of a singleton. I find that confusing behavior.

I understand that for some frequently used constructions, like Component.types it is desirable to advertise that a singleton is acceptable. I do not think we should spread this far and wide though. I'm fine with the code accepting these values, but advertising them is a bridge too far for me at the moment. I view this as SHOULD NOT. The programmer MAY set singleton values, and SHOULD NOT set singleton values.

If you can enumerate a set of values that should be advertised as allowing a singleton or a list, I would implement that. I am reluctant to advertise it everywhere. I think it will be too easy to abuse and to confuse.

>>> import sbol3
>>> sbol3.set_namespace('https://github.com/SynBioDex/pySBOL3/issues/301')
>>> seq = sbol3.Sequence('seq1', elements='GCAT')
>>> comp = sbol3.Component('c1', types=[sbol3.SBO_DNA], sequences=[seq])
>>> sbol3.SBO_DNA in comp.types
True
>>> comp.types == [sbol3.SBO_DNA]
True
>>> seq in comp.sequences
True
>>> 
>>> # Surprising!
>>> comp.sequences == [seq]
False
>>> 
>>> comp.sequences == [seq.identity]
True
jakebeal commented 3 years ago

I can see your point, @tcmitchell .

What do you think should be the distinction between arguments where we want to advertise flexibility, like Component.types and ones where we definitely do not want to advertise flexibility, like Component.features?

One reasonable possibility would be to just have me read through all of the properties and give a list of the ones I think we should advertise singletons for. I think that wouldn't actually be too big.

But maybe you have a more principled thought in mind?

tcmitchell commented 3 years ago

I don't have a more principled thought in mind.

And upon further reflection it also occurred to me that if it is allowed, it should be typed as allowed. In other words, make the typing reflect the code. Why hide a feature we have implemented?

Sigh.

bbartley commented 3 years ago

Another approach which is perhaps more principled is to introduce some specialized classes at the library level (as opposed to the spec level). So for example, we might have a Component (singleton) vs MultisequenceComponent (list); and Interaction (singleton) vs MultitypeInteraction (list). I actually rather like this, except for the fact that it is a slippery slope that ends up at MultisequenceMultitypeComponent.

jakebeal commented 3 years ago

@bbartley I dislike your proposal.

@tcmitchell : I looked through and found the following multi-value properties are ones that I expect will have a singleton common case during construction:

These multi-value properties, on the other hand, will generally not have singletons for their constructor calls:

I would like to see singletons in the typing for the first list, as otherwise I will see many warnings in my code. I don't have a strong opinion about the second list.

tcmitchell commented 2 years ago

Note to self: the following remain to be done on branch 301-list-wrapping: