Open bardram opened 1 week ago
@yuanchen233 - I think the main problem here is that data is exported from the database directly rather than from the DataStreamService
. We have talked about this before - the approach of bypassing the domain logic in CAPR Core, and I really do not like this. It goes against everything that DDD recommends.
To quote Working_with_models:
In domain-driven design, an object's creation is often separated from the object itself.
A repository, for instance, is an object with methods for retrieving domain objects from a data store (e.g. a database). Similarly, a factory is an object with methods for directly creating domain objects.
I am sure @Whathecode will agree ;-)
Yup. The endpoints on that service are severely lacking, however, but this should be a prompt to add/work on those.
- The JSON uses a mixture of
camelCase
andsnake_case
serialization. Default JSON in CARP Core is alwayscamelCase
. For example,first_sequence_id
should befirstSequenceId
(seeDataStreamSequence
).
This is caused by JsonNaming
is set to SnakeCaseStrategy
, and all column generated from Auditable
is using the snake naming schema. It is unsure to me if those extra field is still needed.
- The
data_stream_id
(which should bedataStreamId
) is a JSON object (and not a number). See DataStreamId for details.- The JSON contains data that is not part of the CARP Core domain model. For example
id
,last_sequence_id
,created_by
, etc.- The
snapshot
is not part of theDataStreamSequence
domain model. However, theSyncPoint
is but is embedded in this weird "snapshot" thing.
This is cuased by webservices' wired design of dataStream database table/entity, we have this 3 table and the notion of snapshot
inherited from old Java project, and also plan to change it at #137
I think the main problem here is that data is exported from the database directly rather than from the DataStreamService. We have talked about this before - the approach of bypassing the domain logic in CAPR Core, and I really do not like this. It goes against everything that DDD recommends.
Agreed. Currently, the export endpoint ( the export button on the portal ) performs a database dump. It does not utilize the API of each ApplicationService
to retrieve data, and this should be changed. This change was proposed but postponed due to the need to maintain a consistent format for the exported data of active studies.
Yup. The endpoints on that service are severely lacking, however, but this should be a prompt to add/work on those.
Most of the issues are caused by the current design of database, given that they works, retriving data through core endpoints like DataStreamServicesRequest.GetDataStream
should work properly without exposing those extra information, despite the fact that the database do need some re-design.
In short, I think there are a few things we could/should do, some of them might be off topic here:
export
Export
feature in webservice should also use core endpoint to retrieve data, instead of a database dumpsnapshot
Auditable
and decided if it's still needed for the current implementation of webservicessnapshot
s which is created using core Json serializer. This is not very efficient and does not utilize any advantage a database provide, and the process time is way too long when the number of instances go up. (e.g. ReruitmentSnapshot keeps a list of all participantGroup, when we have 20000 participantGroups and a new participantGroup is created, it takes huge amount of time to build the Recruitment object, updata it and convert it back to Json and save it)If we decide to re-design the database later, some word of suggestion from @Whathecode would be very helpful and much appreciated.
Current webservices' dataStream database implementation:
When exporting data, in the
data-streams.json
file I wold expect to get correctly formatted JSON according to the CARP Core domain model. But - alas - there are issues.Here is an example of some JSON exported:
data-streams.json
```json { "id": 4961, "data_stream_id": 631, "snapshot": { "syncPoint": { "synchronizedOn": "1970-01-01T00:00:00Z", "relativeClockSpeed": 1.0, "sensorTimestampAtSyncPoint": 0 }, "triggerIds": [ 4 ], "measurements": [ { "data": { "__type": "dk.cachet.carp.freememory", "freeVirtualMemory": 2999164928, "freePhysicalMemory": 164835328 }, "sensorStartTime": 1729864247872632 }, { "data": { "__type": "dk.cachet.carp.freememory", "freeVirtualMemory": 2952347648, "freePhysicalMemory": 117329920 }, "sensorStartTime": 1729864307857247 }, { "data": { "__type": "dk.cachet.carp.freememory", "freeVirtualMemory": 2957910016, "freePhysicalMemory": 124141568 }, "sensorStartTime": 1729864367870355 }, { "data": { "__type": "dk.cachet.carp.freememory", "freeVirtualMemory": 2950742016, "freePhysicalMemory": 113905664 }, "sensorStartTime": 1729864427889803 } ] }, "first_sequence_id": 29, "last_sequence_id": 32, "created_by": "20e2093c-e510-4e79-9385-478a19dc4723", "created_at": "2024-10-25T13:54:43.080504Z", "updated_by": "20e2093c-e510-4e79-9385-478a19dc4723", "updated_at": "2024-10-25T13:54:43.081204Z" }, { "id": 4605, "data_stream_id": 655, "snapshot": { "syncPoint": { "synchronizedOn": "1970-01-01T00:00:00Z", "relativeClockSpeed": 1.0, "sensorTimestampAtSyncPoint": 0 }, "triggerIds": [ 9 ], "measurements": [ { "data": { "__type": "dk.cachet.carp.triggeredtask", "control": "Start", "taskName": "Task #17", "triggerId": 9, "destinationDeviceRoleName": "Polar HR Sensor" }, "sensorStartTime": 1729776088592930 } ] }, "first_sequence_id": 1, "last_sequence_id": 1, "created_by": "20e2093c-e510-4e79-9385-478a19dc4723", "created_at": "2024-10-24T13:31:24.977530Z", "updated_by": "20e2093c-e510-4e79-9385-478a19dc4723", "updated_at": "2024-10-24T13:31:24.979868Z" }, ... ```Compared to the CARP Core Domain data model there are several issues with this JSON:
camelCase
andsnake_case
serialization. Default JSON in CARP Core is alwayscamelCase
. For example,first_sequence_id
should befirstSequenceId
(seeDataStreamSequence
).data_stream_id
(which should bedataStreamId
) is a JSON object (and not a number). See DataStreamId for details.id
,last_sequence_id
,created_by
, etc.snapshot
is not part of theDataStreamSequence
domain model. However, theSyncPoint
is but is embedded in this weird "snapshot" thing.