OSVR / OSVR-Core

The core libraries, applications, and plugins of the OSVR software platform.
Apache License 2.0
329 stars 124 forks source link

VRPN limits #262

Open janoc opened 8 years ago

janoc commented 8 years ago

VRPN has plenty of hardwired limits, such as max 1024 tracker sensors, 128 analogs, etc. I am hitting them when trying to send joint tracking quality information using analogs, because the osvrPoseState (and the underlying VRPN message) don't have provisions to do so. I need 150 analogs for a system that supports 6 users and 25 tracked joints on each, exceeding the limit.

These hardwired limits are going to pose problems for larger installations, especially when multiple plugins/devices are served by the same server.

rpavlik commented 8 years ago

Where not arbitrary, they can probably be lifted (like number of tracker sensors - they're not maintained in some static array). The analog limit is probably because all values are sent with each message.

One way of dealing with this would be to create a new device (device name and token) for some reasonable subset of the data, since I think the sensor limits are just per-device, rather than server-wide.

janoc commented 8 years ago

The analog limit is certainly a statically allocated array in a few places - e.g. vrpn_print_devices has crashed on me with memory corruption if I have increased the limit on the server but not on the client.

The idea with creating multiple devices is probably a workable workaround for the time being, but it is a kludge at best to have to split the data in an arbitrary manner like that. It would be better to have a proper fix long term. I feel that we are pushing VRPN to do things it was never designed to do here.

russell-taylor commented 8 years ago

The normal approach in VRPN when dealing with a new class of device is to come up with a new device type to handle it. Running against limits on the behavior of an Analog may be because we're treating it as too general of an object.

I thought I had turned the number of trackers into dynamically allocated on the client side to avoid running into limits, but maybe I just bumped up the limits.

Maybe it is time for VRPN version 5, which takes limits off all or most of the things in it. It will be a breaking change, but that will make it clear that you shouldn't use the old clients with the new servers.

JeroMiya commented 8 years ago

I think the removal of hard coded arbitrary limits is a good enough reason to make a breaking change to a protocol. This is especially true when there is already an application of the protocol that runs up against those limits, as is the case here. Does VRPN have a protocol version as part of the protocol?

Regarding new device types, i would agree as long as we don't have multiple device types for the exact same data structure. We don't need multiple device types for single analog value streams, for example.

janoc commented 8 years ago

Re new device types - I would be rather careful with that. E.g. having several incompatible Tracker types that differ only in one or two members of the data structure but otherwise are identical because the same class was extended in different ways would be counterproductive. We don't want to end up in a situation like with USB cables - Murphy ensures that whenever you need to connect a device to something else you never have the cable with the right plugs on hand.

I am not sure how to do it, but it would be very good to e.g. for the tracker info messages to be subset of one another, so that an older client can work with a newer server (within reason). Of course, it will not see the added info - such as the tracking quality data - but it should be able to continue to use the basic tracker without having to modify it's code to switch to another device type only because a new field was added to some struct.

russell-taylor commented 8 years ago

Thanks for your words of caution, which sound like they come from experience. The way this is handled for trackers now is that there are different message types from the same device, with the additional info in each. Extending by adding new message types is backwards-compatible.

The limits being removed are on the number of sensors or number of values of a particular type, not a type change.

The type comment I made above referred to adding another VRPN base device, rather than an analog, to report things that are more specific. For instance, you could (and should not) report tracker values as an analog with seven values. Same for an image (more values). By having types specific to the function, as opposed to the data protocol, it is more likely that a client will know how to interpret them correctly.

janoc commented 8 years ago

Thanks for your words of caution, which sound like they come from experience. The way this is handled for trackers now is that there are different message types from the same device, with the additional info in each. Extending by adding new message types is backwards-compatible.

You mean that e.g. if I wanted to have tracking quality info, it is sent over a different message type and thus requires a separate callback/handler?

That works from an engineering standpoint, on the other hand, this is impractical for certain types of data - like the quality information which is typically per sensor/mocap joint. Having this info separate from the regular position/quat to which it applies makes it very painful to handle because the data need to be merged, could arrive at different times, etc.

Another use case where this approach to extending the existing functionality is not going to work is the concept of a data frame. E.g. OptiTrack from Natural Point works in such way that you get an entire frame of data every time over their proprietary NatNet protocol. A frame is understood as a collection of all rigid body states (pos/quat/status - tracked/not tracked/potentially quality info) and also all tracked markers which aren't associated to any rigid bodies (plus a few other things). The key thing is that all these are temporally correlated - it is a snapshot of the scene in time. This extra piece of information is extremely valuable when one is trying to do things like motion capture or calculate motion of objects from several rigid bodies/markers. VRPN currently doesn't have a sensible way to represent this - we can only fudge with the time stamps (which are more often than not incorrect) and hope that the client will collect all the messages within a time window and somehow match the timestamps to identify what belongs to a single frame (and thus makes sense to use in the subsequent computation). Having an explicit frame ID within the message and/or a way to send all the data as a single message would make things a lot easier.

So a logical approach would be extending the existing tracker message by adding the extra field(s) (or defining a new device type derived from tracker). However, that breaks the protocol compatibility and potentially could be done multiple times and in incompatible ways - which is what I had in mind when speaking about those USB plugs ...

russell-taylor commented 8 years ago

Regarding data frames, this is like what happens with the vrpn_Imager. An imager server can send a bunch of regions, and it groups them together with beginFrame/endFrame messages. It does not have to fill in the whole screen between these, but it lets the receiver know when it should allocate its data (begin) and when it is done with it and it is ready to be displayed (end). This requires sending everything over the Reliable channel.

This could be added as an additional message type on top of a tracker. I'd imagine it being a TrackerCollection object or something like that. It might be more generic, including analog readings and such from a single device, so perhaps a beginEpoch/endEpoch pair of messages. Again, you'd need to make sure that every message that was supposed to be correlated by these epochs are sent reliably. One way to do this is by asking for a tcp:// connection To VRPN, which does not set up the UDP unreliable channel. Since the person writing the Epoch may also be writing the device driver, they could just sent all messages from that device reliably.

Regarding VRPN time stamps, these are completely up to the driver writer for each device. For the Polhemus drivers I wrote back when, it timed them based on the first character across the serial line for a given report, back-dated by the baud rate. For the predictive orientation tracker, it pushes the timestamp into the future w.r.t. the measurement it used to predict from. The definition of what should be there is clear: it should be the best estimate of when the actual measurement was taken. I agree that not all drivers are careful about what they use, and many use the default ("now" when the message is packed).

janoc commented 8 years ago

Regarding data frames, this is like what happens with the vrpn_Imager. An imager server can send a bunch of regions, and it groups them together with beginFrame/endFrame messages. It does not have to fill in the whole screen between these, but it lets the receiver know when it should allocate its data (begin) and when it is done with it and it is ready to be displayed (end). This requires sending everything over the Reliable channel.

Ah oops. I didn't think about that one. That could be an issue, because streaming tons of data from a motion capture system is not the best use case for TCP - one dropped packet could introduce unacceptable latency. On the other hand, what would happen if the data went out over unreliable (UDP) channel? Only the frame boundary markers need to be somewhat reliable - that could be ensured either by sending them over TCP or by sending them by UDP and requiring an explicit ACK from the receiver, otherwise they get re-sent.

I see two possibilities - either the message disappears completely and then I will simply miss the data in the frame, as if the marker was not tracked, for example. The application has to expect that sort of thing anyway because tracking could be lost for various other reasons as well.

The other possibility is that the message shows up after the end frame has arrived, either due to packet reordering or because the message was retransmitted. If the client can figure out that the message that has arrived should have been part of a frame which is finished and no longer valid (e.g. because the messages carry a tag marking them as part of a certain frame) then I can discard it safely, knowing it is incorrect data. That is acceptable in my book.

Otherwise this idea with the TrackerCollection could well address this issue, where a frame of tracking data is sent along with some extra information - like tracking quality or simply some device-specific info in an opaque blob of data so that ad-hoc extensions can be made (sometimes dirty hacks are needed and not general enough to be generalized and pushed upstream).

Re time stamps - yes, I am aware of that. I didn't mean it as blaming VRPN for the incorrect time stamps. However, I have seen a lot of servers (commercial included) where the authors simply didn't bother with filling the time stamp :(

russell-taylor commented 8 years ago

It used to be the case that packets got dropped fairly frequently, and it is still the case over wireless networks and the Internet. Since almost all local traffic goes over switches, and they queue, lost packets on an intranet are now quite rare. I think that switching to TCP and marking front and end frames should work well in this environment and would also be backwards and forwards compatible.

I'd strongly push back against sending blobs for data that might be produced by more than one device. One of VRPN's strengths is its factoring of devices into functions and then providing interfaces for those functions (and its corresponding weakness is its lack of atomic per-device operations).

Regarding atomicity: On a device driver that you control, you can use time as an epoch to group multiple entries. Your client can then rely, for that device, on it behaving as it should. If messages are sent reliably, then ordering can also be used to tag a tracker reading with the features that apply to it by sending the features first and the reading last. None of this requires modifying the existing protocol except by adding new message types that can be safely ignored by apps that don't care about them.

janoc commented 8 years ago

It used to be the case that packets got dropped fairly frequently, and it is still the case over wireless networks and the Internet. Since almost all local traffic goes over switches, and they queue, lost packets on an intranet are now quite rare. I think that switching to TCP and marking front and end frames should work well in this environment and would also be backwards and forwards compatible.

Streaming over TCP when running over Wifi is quite a problem - it takes one lost packet to introduce a long delay until the packet times out. And while wifi isn't common for the tracking systems themselves (they usually sit on ethernet) it is starting to be very common for things like HMDs (e.g. GearVR) - there you simply don't even have another option. I wouldn't advise moving the entire data stream to TCP.

I'd strongly push back against sending blobs for data that might be produced by more than one device. One of VRPN's strengths is its factoring of devices into functions and then providing interfaces for those functions (and its corresponding weakness is its lack of atomic per-device operations).

I am not advocating having such drivers in core VRPN. That would be, indeed, bad design. I was more thinking about providing something like "user data" option if someone needs to send application-specific information in certain situation within the data frame.

Regarding atomicity: On a device driver that you control, you can use time as an epoch to group multiple entries.

This can certainly be done, but you can never know whether you have a complete frame already or not yet and have to rely on timeouts or a frame arriving with a timestamp that is different from the previous ones. That's why this sort of solution is a pain to use. The explicit marking with start/end of frame messages is much easier.

russell-taylor commented 8 years ago

Regarding "user-data" packets, that is what is sent in the underlying VRPN protocol. Trackers and other devices layer protocols on top of this basic blob, but it is possible for a device to send a blob (with a specified message type and length) and for a client to receive them independently of the standard VRPN traffic.

Regarding timestamps vs. start/end, I was thinking that you'd use the timestamp as the epoch marker rather than adding another epoch marker, and still use the start/end.

janoc commented 8 years ago

Regarding "user-data" packets, that is what is sent in the underlying VRPN protocol. Trackers and other devices layer protocols on top of this basic blob, but it is possible for a device to send a blob (with a specified message type and length) and for a client to receive them independently of the standard VRPN traffic.

Right, that's true. It just wouldn't be very "integrated" with the rest of the protocol for a data frame supporting device. Anyhow, this was only an idea that might be useful at some point, but it isn't something we cannot live without.

Regarding timestamps vs. start/end, I was thinking that you'd use the timestamp as the epoch marker rather than adding another epoch marker, and still use the start/end.

Aha ok, then I have misunderstood. I thought you wanted the client to assemble the messages only based on the time stamps. Makes better sense like this.