cph-cachet / carp-webservices-spring

REST-based implementation of the CARP Core Domain Model using Spring Boot in Kotlin
https://carp.dk
MIT License
3 stars 2 forks source link

The DataStream data downloaded in the Export has wrong JSON format and missing data #178

Open bardram opened 1 week ago

bardram commented 1 week ago

When exporting data, in the data-streams.json file I wold expect to get correctly formatted JSON according to the CARP Core domain model. But - alas - there are issues.

Here is an example of some JSON exported:

data-streams.json ```json { "id": 4961, "data_stream_id": 631, "snapshot": { "syncPoint": { "synchronizedOn": "1970-01-01T00:00:00Z", "relativeClockSpeed": 1.0, "sensorTimestampAtSyncPoint": 0 }, "triggerIds": [ 4 ], "measurements": [ { "data": { "__type": "dk.cachet.carp.freememory", "freeVirtualMemory": 2999164928, "freePhysicalMemory": 164835328 }, "sensorStartTime": 1729864247872632 }, { "data": { "__type": "dk.cachet.carp.freememory", "freeVirtualMemory": 2952347648, "freePhysicalMemory": 117329920 }, "sensorStartTime": 1729864307857247 }, { "data": { "__type": "dk.cachet.carp.freememory", "freeVirtualMemory": 2957910016, "freePhysicalMemory": 124141568 }, "sensorStartTime": 1729864367870355 }, { "data": { "__type": "dk.cachet.carp.freememory", "freeVirtualMemory": 2950742016, "freePhysicalMemory": 113905664 }, "sensorStartTime": 1729864427889803 } ] }, "first_sequence_id": 29, "last_sequence_id": 32, "created_by": "20e2093c-e510-4e79-9385-478a19dc4723", "created_at": "2024-10-25T13:54:43.080504Z", "updated_by": "20e2093c-e510-4e79-9385-478a19dc4723", "updated_at": "2024-10-25T13:54:43.081204Z" }, { "id": 4605, "data_stream_id": 655, "snapshot": { "syncPoint": { "synchronizedOn": "1970-01-01T00:00:00Z", "relativeClockSpeed": 1.0, "sensorTimestampAtSyncPoint": 0 }, "triggerIds": [ 9 ], "measurements": [ { "data": { "__type": "dk.cachet.carp.triggeredtask", "control": "Start", "taskName": "Task #17", "triggerId": 9, "destinationDeviceRoleName": "Polar HR Sensor" }, "sensorStartTime": 1729776088592930 } ] }, "first_sequence_id": 1, "last_sequence_id": 1, "created_by": "20e2093c-e510-4e79-9385-478a19dc4723", "created_at": "2024-10-24T13:31:24.977530Z", "updated_by": "20e2093c-e510-4e79-9385-478a19dc4723", "updated_at": "2024-10-24T13:31:24.979868Z" }, ... ```

Compared to the CARP Core Domain data model there are several issues with this JSON:

bardram commented 1 week ago

@yuanchen233 - I think the main problem here is that data is exported from the database directly rather than from the DataStreamService. We have talked about this before - the approach of bypassing the domain logic in CAPR Core, and I really do not like this. It goes against everything that DDD recommends.

To quote Working_with_models:

In domain-driven design, an object's creation is often separated from the object itself.

A repository, for instance, is an object with methods for retrieving domain objects from a data store (e.g. a database). Similarly, a factory is an object with methods for directly creating domain objects.

I am sure @Whathecode will agree ;-)

Whathecode commented 1 week ago

Yup. The endpoints on that service are severely lacking, however, but this should be a prompt to add/work on those.

yuanchen233 commented 1 week ago
  • The JSON uses a mixture of camelCase and snake_case serialization. Default JSON in CARP Core is always camelCase. For example, first_sequence_id should be firstSequenceId (see DataStreamSequence).

This is caused by JsonNaming is set to SnakeCaseStrategy, and all column generated from Auditable is using the snake naming schema. It is unsure to me if those extra field is still needed.

  • The data_stream_id (which should be dataStreamId) is a JSON object (and not a number). See DataStreamId for details.
  • The JSON contains data that is not part of the CARP Core domain model. For example id, last_sequence_id, created_by, etc.
  • The snapshot is not part of the DataStreamSequence domain model. However, the SyncPoint is but is embedded in this weird "snapshot" thing.

This is cuased by webservices' wired design of dataStream database table/entity, we have this 3 table and the notion of snapshot inherited from old Java project, and also plan to change it at #137

I think the main problem here is that data is exported from the database directly rather than from the DataStreamService. We have talked about this before - the approach of bypassing the domain logic in CAPR Core, and I really do not like this. It goes against everything that DDD recommends.

Agreed. Currently, the export endpoint ( the export button on the portal ) performs a database dump. It does not utilize the API of each ApplicationService to retrieve data, and this should be changed. This change was proposed but postponed due to the need to maintain a consistent format for the exported data of active studies.

Yup. The endpoints on that service are severely lacking, however, but this should be a prompt to add/work on those.

Most of the issues are caused by the current design of database, given that they works, retriving data through core endpoints like DataStreamServicesRequest.GetDataStream should work properly without exposing those extra information, despite the fact that the database do need some re-design.


In short, I think there are a few things we could/should do, some of them might be off topic here:

If we decide to re-design the database later, some word of suggestion from @Whathecode would be very helpful and much appreciated.

yuanchen233 commented 4 days ago

Current webservices' dataStream database implementation: DataStream_UML_Final