dfki-ric / pytransform3d

3D transformations for Python.
https://dfki-ric.github.io/pytransform3d/
Other
632 stars 68 forks source link

Feature request: serialization of TransformManager #215

Closed ljmanso closed 1 year ago

ljmanso commented 1 year ago

It would be great if TransformManager had a method to serialize/load TransfromManager objects into/from files. We are currently using pickle but it sometimes generate errors when the module versions (e.g., SciPy) differ.

AlexanderFabisch commented 1 year ago

Hi @ljmanso ,

thanks for the feedback. I was about to suggest using pickle. Do you have a format suggestion? The problem with text-based representations is that you lose precision in floats, but it would be the easiest to export to json without any additional dependencies.

ljmanso commented 1 year ago

Thanks for the quick reply.

An option would be to serialize arrays to strings (e.g., using numpy.array2string) and then have them into the JSON file as a strings so that JSON does not mess with the precision. I don't think that JSON would be able to serialize np arrays anyway.

The following shows how it would work:

import json
import numpy as np

# This structure cannot be directly serialized
some_structure = {'RF1': np.array([1.1234567890123456789,np.pi])}
#dump = json.dumps(some_structure)
#print(dump)

# We need to make it "json-compatible"
json_ok = {}
for k, v in some_structure.items():
    v2 = np.array2string(v, precision=32)
    json_ok[k] = v2
dump = json.dumps(json_ok)
print(dump)

I am not familiar with the internals of pytransform3d, but please let me know if I can help with this.

ljmanso commented 1 year ago

Let me add the de-serialization to the previous example:

import json
import numpy as np

np.set_printoptions(precision=32)

# From https://stackoverflow.com/questions/35750639/how-can-a-string-representation-of-a-numpy-array-be-converted-to-a-numpy-array
def string_to_numpy(text, dtype=None):
    """
    Convert text into 1D or 2D arrays using np.matrix().
    The result is returned as an np.ndarray.
    """
    import re
    text = text.strip()
    # Using a regexp, decide whether the array is flat or not.
    # The following matches either: "[1 2 3]" or "1 2 3"
    is_flat = bool(re.match(r"^(\[[^\[].+[^\]]\]|[^\[].+[^\]])$",
                            text, flags=re.S))
    # Replace newline characters with semicolons.
    text = text.replace("]\n", "];")
    # Prepare the result.
    result = np.asarray(np.matrix(text, dtype=dtype))
    return result.flatten() if is_flat else result

# This structure cannot be directly serialized
some_structure = {'RF1': np.array([1.1234567890123456789,np.pi])}
print(some_structure)
#dump = json.dumps(some_structure)
#print(dump)

# We need to make it "json-compatible"
json_ok = {}
for k, v in some_structure.items():
    v2 = np.array_str(v, precision=32)
    json_ok[k] = v2
serialised = json.dumps(json_ok)
print(serialised)

# We can convert it back from the string, but we still need to convert those strings into ndarrays
read_from_json = json.loads(serialised)
print(read_from_json)
deserialised = {}
for k, v in read_from_json.items():
    v2 = string_to_numpy(v)
    deserialised[k] = v2

print(deserialised)
AlexanderFabisch commented 1 year ago

The only internal part that is a bit difficult to serialize is a scipy.sparse.csr_matrix, which stores connections of the graph. I guess it would be possible to find a solution for this though.

About the interface: I would suggest

The UrdfTransformManager could override these functions.

edit:

from the documentation of csr_matrix:

    csr_matrix((data, indices, indptr), [shape=(M, N)])
        is the standard CSR representation where the column indices for
        row i are stored in ``indices[indptr[i]:indptr[i+1]]`` and their
        corresponding values are stored in ``data[indptr[i]:indptr[i+1]]``.
        If the shape parameter is not supplied, the matrix dimensions
        are inferred from the index arrays.

Attributes
----------
data
    CSR format data array of the matrix
indices
    CSR format index array of the matrix
indptr
    CSR format index pointer array of the matrix

Those are all numpy array, so we can just serialize them.

edit2: array2string should be used with floatmode="unique"

AlexanderFabisch commented 1 year ago

@ljmanso: here is the current version of the serialization / deserialization of TransformManagers: #216

You can use the function to_dict to generate a dict that you can save to disk with json.dump. TransformManager.from_dict will initialize an object from such a dict. I will probably not implement it for UrdfTransformManagers at the moment as this is a lot more complicated. Let me know what you think.

ljmanso commented 1 year ago

Hello Alexander,

Thanks for that. I have made a couple of changes related to the assertion in from_dict and the shape of the arrays. I will send the PR as long as I finish and remember how PRs were done :-D

ljmanso commented 1 year ago

I made the PR. Just a few comments:

AlexanderFabisch commented 1 year ago

Hi, I completely changed the conversion to dictionaries now. I don't use a string representation of arrays anymore. They will be converted to lists and numbers will be stored as their string representation, which is not compact when stored as json but good enough. If you want to have a better serialization, you can use msgpack or a similar binary format.

Here is an example with JSON:

tm_dict = tm.to_dict()
with open(filename, "w") as f:
    json.dump(tm_dict, f)

with open(filename, "r") as f:
    tm_dict2 = json.load(f)
tm2 = TransformManager.from_dict(tm_dict2)
AlexanderFabisch commented 1 year ago

Should be solved with #216

ljmanso commented 1 year ago

That works great Thanks!

AlexanderFabisch commented 1 year ago

Great, thanks!