entities don't handle being set to np.inf

raphaelcervantes commented 3 years ago

My data taking scripts ran into an edge case where it tried to set an entity to np.inf. The script just hung. I didn't even see an error in the service logs. I think the dripline should catch this case and try to intentionally crash or throw an error.

To go into more detail, here is the relevant portion of my control script.

                popt_reflection, pcov_reflection = data_lorentzian_fit(s11_pow, freq, 'reflection')
                perr_reflection = np.sqrt(np.diag(pcov_reflection))

                print('Reflection lorentzian fitted parameters')
                print(popt_reflection)
                self.cmd_interface.set('f_reflection', popt_reflection[0])
                self.cmd_interface.set('sig_f_reflection', perr_reflection[0])
                self.cmd_interface.set('Q_reflection', popt_reflection[1])
                self.cmd_interface.set('sig_Q_reflection', perr_reflection[1])
                self.cmd_interface.set('dy_reflection', popt_reflection[2])
                self.cmd_interface.set('sig_dy_reflection', perr_reflection[2])
                self.cmd_interface.set('C_reflection', popt_reflection[3])
                self.cmd_interface.set('sig_C_reflection', perr_reflection[3])

My script couldn't perform the fit on the VNA trace.

 Setting na_measurement_status to start_measurement
Logging list of endpoints
Switching to transmission path
Switching to reflection path
Switching to transmission path
VNA reflection measurement
Setting na_measurement_status to start_measurement
Logging list of endpoints
Switching to transmission path
Transmission lorentzian fitted parameters
[1.61968618e+10 1.59075036e+04 2.62957230e-01 3.45835612e-03]
Switching to reflection path
/usr/local/lib/python3.7/site-packages/scipy/optimize/minpack.py:829: OptimizeWarning: Covariance of the parameters could not be estimated
  category=OptimizeWarning)
Reflection lorentzian fitted parameters
[1.61903696e+10 4.84379319e+00 2.84176343e+02 2.84231409e+02]

I think when I get an OptimizeWarning error, the values of my pcov are np.inf, so it tried to set sig_f_reflection to np.inf and just hung without making any sort of progress.

➜  ~ kubectl logs -f double-precision-logger-dripline-python-deployment-6b7fbf8d4b7h --tail 20 
{'timestamp': '2021-03-22T16:32:18.386845Z', 'sensor_name': 'Q_transmission', 'value_cal': 15907.503649437182, 'value_raw': 15907.503649437182}
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(49) -> finished processing data
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(46) -> insert data are:
{'timestamp': '2021-03-22T16:32:18.399052Z', 'sensor_name': 'sig_Q_transmission', 'value_cal': 570.3355496234211, 'value_raw': 570.3355496234211}
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(49) -> finished processing data
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(46) -> insert data are:
{'timestamp': '2021-03-22T16:32:18.411096Z', 'sensor_name': 'dy_transmission', 'value_cal': 0.26295723019139855, 'value_raw': 0.26295723019139855}
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(49) -> finished processing data
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(46) -> insert data are:
{'timestamp': '2021-03-22T16:32:18.423272Z', 'sensor_name': 'sig_dy_transmission', 'value_cal': 0.010107652156098478, 'value_raw': 0.010107652156098478}
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(49) -> finished processing data
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(46) -> insert data are:
{'timestamp': '2021-03-22T16:32:18.436062Z', 'sensor_name': 'C_transmission', 'value_cal': 0.003458356118361542, 'value_raw': 0.003458356118361542}
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(49) -> finished processing data
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(46) -> insert data are:
{'timestamp': '2021-03-22T16:32:18.449049Z', 'sensor_name': 'sig_C_transmission', 'value_cal': 0.00015618094767875254, 'value_raw': 0.00015618094767875254}
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(49) -> finished processing data
2021-03-22T16:32:26[INFO    ] dripline.implementations.postgres_sensor_logger(46) -> insert data are:
{'timestamp': '2021-03-22T16:32:26.161902Z', 'sensor_name': 'f_reflection', 'value_cal': 16190369623.81609, 'value_raw': 16190369623.81609}
2021-03-22T16:32:26[INFO    ] dripline.implementations.postgres_sensor_logger(49) -> finished processing data
2021-03-22T16:32:18[INFO    ] dripline.implementations.postgres_sensor_logger(49) -> finished processing data
2021-03-22T16:32:26[INFO    ] dripline.implementations.postgres_sensor_logger(46) -> insert data are:
{'timestamp': '2021-03-22T16:32:26.161902Z', 'sensor_name': 'f_reflection', 'value_cal': 16190369623.81609, 'value_raw': 16190369623.81609}

laroque commented 3 years ago

It would be useful to understand where this is failing, dripline should be trying to serialize this in json, which probably won't actually do what you want but I think should work from the client side. I'd bet the service doesn't know what to do with whatever that gets rendered into though.

raphaelcervantes commented 3 years ago

@laroque Is there anything you want me to do on my end to diagnose this?

For now, I have my control scripts throw an exception whenever my curve_fit throws an error.

laroque commented 3 years ago

I've been mulling over this but I haven't dug into the code (much less actually tried to reproduce). It isn't clear to me if this is a problem on the client side with sending the message, or on the server side with responding to it. The solution will be very dependent on that. Several different paths that we could follow up:

Question

Can you help clarify where it is failing? I think the steps are something like:

your python code makes a call to core dripline to send a message
there are calls down to the C++ which converts to a message object
the actual message is sent over AMQP
the message is received by the service and decoded back into some native objects
your custom Entity does something with the message object it received

this leads to....

Hypothesis

I expect that the problem is that above in step 2 the C++ implemented binding doesn't know how to deal with a numpy array and/or a numpy.infinity type when converting to a scarab Param object (for constructing a dripline message and eventually serializing to json). A useful check of this idea would be to use something like the monitor subcommand, or maybe just watching entity logs to see if those calls to set are actually producing a dripline message on the AMQP bus, and if those messages look sensible. If this fails then the issue is with (numpy.*) -> (dripline Message) conversion. Another possibility (if the above is all working) would be that the Entity is receiving the message but the types don't make sense and something in the code is failing; for example you start with numpy.array([1,2,3,numpy.infinity]) (a numpy array with numeric values that may include infinity) in the client but what you end up with on the server is [1, 2, 3, "Infinity"] (a list with a mix of strings and numbers).

Sidestep/workaround

We should probably figure out a place to document this more clearly, but dripline as written doesn't support arbitrary data types from outside of the standard python library (for example, numpy types). There are a couple of issues around this:

the input data are converted from python types to actual messages by first converting to a scarab::param type. This happens in the C++ binding code and so all supported types have to be dealt with explicitly. There are some games that can be played with C++ templates and similar abstraction, but it only goes so far. It may be the case that we want to explicitly add support for numpy, but I don't think that it is currently a dependency of core dripline.
it isn't clear how you would write this in a way that conserves the typing in a round trip. That is if I have numpy.array([1.1,2.2,3.3]) (input data) -> "[1.1, 2.2, 3.3]" (json in message)` -> ??? (output data) how do I know if the output should be a python list of python floats, or a numpy array of C longdoubles?

... that was a distraction, what I meant to say was: If you convert your data to native python types before calling set, then dripline should be able to deal with that data the same way that it always does. Then if your Entity needs to consume data from numpy types, possible with special values like infinity, it is your responsibility as the implementer of the entity to convert the native python types you get in the message payload to the custom data type you need in your implementation.

nsoblath commented 3 years ago

We could also look at how something like JSON or YAML encoding works for objects like that in Python. I presume there must be some fairly transparent way that the translation in either direction between a numpy array and the fairly limited types of data allowed by JSON and YAML is done.

raphaelcervantes commented 3 years ago

I'm trying to push hard on Orpheus commissioning right now. I think I'll do some DAQ development next week and can look at this then.

I'd be ok if it was required of me to convert to a native python type as long as that requirement is explicit. But I think it would be good if dripline threw an error when I didn't do that, rather than just hang and stall, necessitating me to manually kill the docker container.

laroque commented 3 years ago

Your expectation here (getting an exception & an explicit requirement in the docs) is reasonable and is what I would have expected to happen. I would have expected failed type conversion to produce an error, I would have expected a failed attempt to send a message to time out and produce a (possibly not so helpful) error.

I think that this is something that we (the dripline side) should fix to basically the state you asked for, the problem is just finding someone with the time to dig in and isolate and fix the problem. The suggestion to convert to python types may or may not end up being the only option long term, but is probably the fastest solution to your problem let you get back to focus on Orpheus, since I don't know that we'll be able to resolve it by then.

raphaelcervantes commented 3 years ago

Here is another instance of dripline not being able to handle numpy arrays.

I tried to use the scipy.interpolate.interp1d function in a dripline extension like so

    interpolated_function = interp1d(resistance_cal, temperature_cal, kind = 'cubic')
    interpolated_temperature = interpolated_function(resistance)

This returns single-valued numpy array, assuming resistance is single-valued.

I see this error in my k8 logs when the calibration function gets called.

2021-06-15T15:30:23[DEBUG   ] dripline.core.calibrate(43) -> formatted cal is:                                                                                                                                                                
x83871_cal(+2.02340640E+03)                                                                                                                                                                                                                   
2021-06-15 15:29:25 [ERROR] rary/endpoint.cc(191): Caught exception from Python: RuntimeError: Unknown python type cannot be converted to param                                                                                               

At:                                                                                                                                                                                                                                           
  /usr/local/lib/python3.8/site-packages/dripline/core/endpoint.py(41): do_get_request

To work around this, I casted the interpolated result as a float.

https://github.com/axiondarkmatterexperiment/dripline-orpheus/blob/0a337da26df6f580e328599b6b273d985e44126b/dripline/extensions/agilent34970A/muxer_calibrations.py#L24

driplineorg / dripline-python