Open bruno-f-cruz opened 11 months ago
Something along these lines:
https://github.com/pydantic/pydantic/issues/726#issuecomment-519276108
Relatedly, any specific reason for the name field being optional?
This creates a situation where even if I wanted to create the dictionary post-hoc, using the array of objects, not only do I have to deal with potential duplicated keys (which it would still be important to solve), but I also have to deal with objects without a name field.
I guess this my be related to #620
yep, we already have a ticket to fix this.
If you have a chance, can you also comment on the original issue? And why Lists > Dict at this point? Just want to understand if it is something we want to think about, or simply drop it. Thanks!
we'll discuss it when we've finished the v2 update
Can we revisit this discussion now that v2 has been merged? I am at a point where having this implemented would save us a lot of headaches down the road when instantiating hardware from the schema. Thanks!
I made a quick test of what this could look like here:
https://github.com/AllenNeuralDynamics/aind-data-schema/pull/704
I used additional_devices a test case. The proposed solution is backward compatible I think
I'm uncomfortable with making the name implicit as a key, and then having to replicate it inside the field itself, but only sometimes. Is the right solution just to make name required? @bruno-f-cruz @saskiad
To clarify, I don't think the name must be the same as the key. However, for backward compatibility, I decided to use the name as a key when constructing the dictionary so people could migrate from List->Dict painlessly. If people define it as a Key:Value pair directly, I don't see why name must be the key. I would be happy to consider other default conversations from List->Dict
After a bit of playing around with the schema, I really think that moving to dictionaries is going to make a lot of the schema grammar much simpler.
For instance, currently in order to pass a DaqChannel
to the Wheel
class, one must know the index of that channel in the device. This creates a huge problem as the order of devices in the array is not stable and can easily change if a channel is deleted. Moreover, it feels a bit clunky to have to define the device name in each channel since, if a channel belongs to a device, the name of the device should be a given.
import os
import aind_data_schema.models.devices as d
from aind_data_schema.models.manufacturers import Manufacturer
wheel_encoder_channel = d.DAQChannel(
device_name='BehaviorBoard',
channel_name='DI0', #This should be an array
channel_type=d.DaqChannelType.DI,
event_based_sampling=True,
)
wheel_break_channel = d.DAQChannel(
device_name='BehaviorBoard',
channel_name='DO0',
channel_type=d.DaqChannelType.DO,
event_based_sampling=True,
)
wheel_torque_channel = d.DAQChannel(
device_name='BehaviorBoard',
channel_name='AI0',
channel_type=d.DaqChannelType.AI,
event_based_sampling=True,
)
harp_board = d.HarpDevice(
name='BehaviorBoard',
serial_number=None,
# manufacturer=Manufacturer.CHAMPALIMAUD, # this crashes
data_interface=d.DataInterface.USB,
computer_name=os.getenv('COMPUTERNAME'),
channels=[wheel_encoder_channel, wheel_break_channel, wheel_torque_channel],
harp_device_type=d.HarpDeviceType.BEHAVIOR,
is_clock_generator=False,
port_index='COM3')
wheel = d.Wheel(
name='ThisWheel',
encoder=harp_board,
encoder_output=harp_board.channels[0],
magnetic_brake=harp_board,
brake_output=harp_board.channels[1],
torque_sensor=harp_board,
torque_output=harp_board.channels[2],
radius=0,
width=0,
pulse_per_revolution=0,
)
My suggestion for a new pattern includes:
Wheel
, since the DAQ channel already has this valid reference.The aforementioned pattern would thus become:
import os
from typing import Dict, Literal, Union, List
import aind_data_schema.models.devices as d
from pydantic import Field
from aind_data_schema.models.manufacturers import Manufacturer
from aind_data_schema.models.units import SizeUnit
class NewHarpDevice(d.HarpDevice):
channels: Dict[str, d.DAQChannel] = Field(default_factory=dict)
def add_channel(self, channel: Union[d.DAQChannel, List[d.DAQChannel]]):
if isinstance(channel, list):
for c in channel:
self.add_channel(c)
else:
channel = self._validate_daq_channel(channel)
self.channels[channel.channel_name] = channel
def remove_channel(self, channel_name: str):
if channel_name in self.channels:
del self.channels[channel_name]
else:
raise ValueError(f"Channel {channel_name} does not exist")
def _validate_daq_channel(self, channel: d.DAQChannel) -> d.DAQChannel:
if not channel.device_name:
channel.device_name = self.name
if channel.device_name != self.name:
raise ValueError(f"Device name {channel.device_name} does not match {self.name}")
if channel.channel_name in self.channels:
raise ValueError(f"Channel {channel.channel_name} already exists")
return channel
class NewWheel(d.MousePlatform):
device_type: Literal["Wheel"] = "Wheel"
radius: float = Field(..., title="Radius (mm)")
width: float = Field(..., title="Width (mm)")
size_unit: SizeUnit = Field(SizeUnit.MM, title="Size unit")
encoder_output: d.DAQChannel = Field(None, title="Encoder DAQ channel")
pulse_per_revolution: int = Field(..., title="Pulse per revolution")
brake_output: d.DAQChannel = Field(None, title="Brake DAQ channel")
torque_output: d.DAQChannel = Field(None, title="Torque DAQ channel")
wheel_encoder_channel = d.DAQChannel(
device_name='',
channel_name='DI0', #This should be an array
channel_type=d.DaqChannelType.DI,
event_based_sampling=True,
)
wheel_break_channel = d.DAQChannel(
device_name='',
channel_name='DO0',
channel_type=d.DaqChannelType.DO,
event_based_sampling=True,
)
wheel_torque_channel = d.DAQChannel(
device_name='',
channel_name='AI0',
channel_type=d.DaqChannelType.AI,
event_based_sampling=True,
)
harp_board = NewHarpDevice(
name='BehaviorBoard',
serial_number=None,
# manufacturer=Manufacturer.CHAMPALIMAUD, # this crashes
data_interface=d.DataInterface.USB,
computer_name=os.getenv('COMPUTERNAME'),
harp_device_type=d.HarpDeviceType.BEHAVIOR,
is_clock_generator=False,
port_index='COM3')
harp_board.add_channel(wheel_encoder_channel) # Notice that once this method runs, it modifies the original object with the name of the DaqDevice.
harp_board.add_channel(wheel_break_channel)
harp_board.add_channel(wheel_torque_channel)
new_wheel = NewWheel(
name='ThisWheel',
radius=0,
width=0,
pulse_per_revolution=0,
encoder_output=wheel_encoder_channel, # just to show that the object is modified and can be used as a reference
brake_output=harp_board.channels['DO0'],
torque_output=harp_board.channels['AI0'],
)
Finally, I would strongly advise to adopt a name for all these objects. It is a bit weird that some have name
, while others have an identical property but called channel_name
or assembly_name
.
A common pattern in the current schema is to use a
List
of generic objects to keep track of such objects. For instance, cameras can be expressed as follows:https://github.com/AllenNeuralDynamics/aind-data-schema/blob/c8aca0341d86efc82aa32ba0255b1f66287a4a3c/src/aind_data_schema/rig.py#L64C1
While easy to use,
List
s make grabbing a specific device somewhat difficult without parsing all objects since they must be indexed by the order (which can easily change across days) or the experimenter must generate somesort of keyed collection type with an unique key (e.g. device name).I wonder if rather than using generic unsorted lists, it would be beneficial to simply use some sort of hashed dictionary structure in the form of
Dict[str, CameraAssembly]
instead ofList[CameraAssembly]
. This would make it much easier to access the objects after deserialization.