ORNL / DataFed

A Federated Scientific Data Management System
https://ornl.github.io/DataFed/
Other
18 stars 13 forks source link

Convert response in CommandLib to python friendly object(s) #545

Open ssomnath opened 3 years ago

ssomnath commented 3 years ago

I think that the response form the CommandLib is very ugly and is not user-friendly.

I should not have to do this:

>>> response = datafed.dataCreate(....)
>>> record_id = response[0].data[0].id
>>> metadata_dict = json.loads(response[0].data[0].metadata)

How this should work is:

>>> response = datafed.dataCreate(....)
>>> record_id = response.id
>>> metadata_dict = response.metadata

I think we should create classes for the responses that are more pythonic. Yes, these classes will be very thin wrappers over protobuf. However, that is perfectly fine and worth the effort in my opinion if DataFed is expected to be seen as a user-friendly application.

Note that I still called the objects response instead of data_record because I feel that calling the class DataRecord might give users the impression that operations on local objects would automatically reflect changes both locally and remotely on DataFed. We have had this conversation before.

dvstans commented 3 years ago

Hey Suhas, The reason for the array operators isn't directly due to the use of Prototbuf. It's because many DataFed calls deal with multiple records at a time - and thus must return arrays of results, and, for consistency, all results were made arrays even if only one result is returned. There should only be one level of nesting (at resposne level) but, in the case of the data create call, this is due to messy message/reply design - not Protobuf or Python. It will be cleaned-up in the future. However, the point is that the replies are already Python objects and once I clean-up the API, there would be no need to further wrap the replies.

ssomnath commented 3 years ago

You're right that the objects being returned are indeed python objects even now. I also understand that in many cases, the returned response could have multiple items being returned. At the same time, it does not make sense for me to index [0] when I know I created just one record. Either way, the objects being returned right now are ugly and one has to do a fair bit of trial and error ("Do I index as a number? Do I just call id as an attribute of the object? What is this object anyway? ....) to get the information they are looking for. At best, I can understand the need for one level of iteration and only in cases when iteration is required - for example a listing of a catalog, etc. There are other things too: Dates should be returned as python date objects as opposed to integers or strings. etc. I remember putting a fair bit of thought into this when I put together my helper functions here last year

dvstans commented 3 years ago

Yes, some fields, like dates, would benefit from additional conversion; however, sometimes returning arrays and sometimes individual objects from the same function call (based on parameters) would be very bad design. My point is that this is a core API issue, not a Python issue - stemming from me being lazy and not defining separate messages for single vs multiple actions.

ssomnath commented 3 years ago

You are right about returning an array with a single item if the function can potentially also return multiple items. create dataset though, doesn't need any iteration whatsoever. I think my concern goes beyond thin classes that are very similar to the objects being currently returned. It also applies to errors. It might make pythonic sense to parse common / typical errors from protobuf and raise python errors that are descriptive. You are of course free to raise a generic Exception object if the returned error message does not fall into any of the expected bins.

See this function or this if condition. A user having to do something like this is stupid.