BIDMCDigitalPsychiatry / LAMP-platform

The LAMP Platform (issues and documentation).
https://docs.lamp.digital/
Other
13 stars 10 forks source link

Not able to disable sensor event cache or force it to flush to database periodically #489

Closed jeydude closed 2 years ago

jeydude commented 2 years ago

Hi,

Recently I updated my lamp api server on my test server to refer

image: ghcr.io/bidmcdigitalpsychiatry/lamp-server:2022

from

    image: bidmcdigitalpsychiatry/lamp-server:2021

after the image updated to 2022, I am not getting any sensor data to LAMP API Server. I could see my activity data, no issues.

Closed the iPhone app and reopened the iPhone few time still having issues to send sensor data to LAMP API server.

Here is the list of sensors and its settings applied now:

{
"data": [
  {
"id": "a0dhh5wty34th4n17cew",
"timestamp": 1616532925174,
"spec": "lamp.analytics",
"name": "analytics",
"settings": {}
},
  {
"id": "mqppvpn4qwqmx1bx8wdm",
"timestamp": 1616532945282,
"spec": "lamp.heart_rate",
"name": "HR",
"settings": {}
},
  {
"id": "aqfb40nrnr5n035gqfqq",
"timestamp": 1616532964156,
"spec": "lamp.screen_state",
"name": "Screen Status",
"settings": {}
},
  {
"id": "mc4f1yd4sf3yvt4w4e2a",
"timestamp": 1616532970516,
"spec": "lamp.sleep",
"name": "Sleep",
"settings": {}
},
  {
"id": "t928ka5nc4fv9g60r0wp",
"timestamp": 1616532976301,
"spec": "lamp.steps",
"name": "steps",
"settings": {}
},
  {
"id": "wvcarrj57jedaje8sa2x",
"timestamp": 1616532981861,
"spec": "lamp.telephony",
"name": "calls",
"settings": {}
},
  {
"id": "3ewpt11q4y656pgwxj8j",
"timestamp": 1623077614569,
"spec": "lamp.accelerometer",
"name": "Accelerometer",
"settings": {
"frequency": 0.0166,
"cellular_upload": true
}
},
  {
"id": "5pb6j7e88amnvwrzph72",
"timestamp": 1623077622389,
"spec": "lamp.gps",
"name": "GPS",
"settings": {
"frequency": 0.0166,
"cellular_upload": true
}
}
],
}
avaidyam commented 2 years ago

@jeydude Can you share your whole docker-compose.yml file? It's likely that NATS and Redis are not properly configured.

jeydude commented 2 years ago

here is the lamp stack yml file

version: '3.7'
services:
  server:
    image: ghcr.io/bidmcdigitalpsychiatry/lamp-server:2022
    healthcheck:
      test: wget --no-verbose --tries=1 --spider http://localhost:3000 || exit 1
    environment:
      HTTPS: 'off'
      SCHEDULER: 'on'
      ROOT_KEY: 'xxxxxxx'
      CDB: 'http://admin:xxxxxx@database:5984/'
      APP_GATEWAY: 'app-gateway.lamp.digital'
      PUSH_API_KEY: 'xxxxxxx'
      DASHBOARD_URL: 'dashboard.lamp.digital'
      REDIS_HOST: 'redis://cache:6379/0'
      NATS_SERVER: 'message_queue:4222'
    networks:
      - default
      - public
    logging:
      options:
        max-size: "10m"
        max-file: "3"
    deploy:
      mode: replicated
      update_config:
        order: start-first
        failure_action: rollback
      labels:
         traefik.enable: 'true'
         traefik.http.routers.lamp_server.entryPoints: 'websecure'
         traefik.http.routers.lamp_server.rule: 'Host(`xxxxxx`)'
         traefik.http.routers.lamp_server.tls.certresolver: 'default'
         traefik.http.services.lamp_server.loadbalancer.server.port: 3000
         traefik.docker.network: 'public'
      placement:
        constraints:
          - node.role == manager 
  database:
    image: mlpcouchdb
    healthcheck:
      test: curl --fail --silent http://localhost:5984/_up || exit 1
    environment:
      COUCHDB_USER: 'admin'
      COUCHDB_PASSWORD: 'xxxxxx'
    volumes:
      - /apps/mindLAMP/data/couchdb:/opt/couchdb/data
    networks:
      - public
    deploy:
      mode: replicated
      update_config:
        order: stop-first
        failure_action: rollback
      placement:
        constraints:
          - node.role == manager
    ports:
    - "4369:4369/tcp"
    - "5984:5984/tcp"
    - "9100:9100/tcp"
    labels:
      traefik.enable: 'true'
      traefik.http.routers.lamp_database.entryPoints: 'websecure'
      traefik.http.routers.lamp_database.rule: 'Host(`xxxxxxxx`)'
      traefik.http.routers.lamp_database.tls.certresolver: 'default'
      traefik.http.services.lamp_database.loadbalancer.server.port: 5984
  cache:
    image: redis:6.0.8-alpine
    healthcheck:
      test: redis-cli ping
    deploy:
      mode: replicated
      update_config:
        order: stop-first
        failure_action: rollback
      placement:
        constraints:
          - node.role == manager
  message_queue:
    image: nats:2.1.9-alpine3.12
    healthcheck:
      test: wget --no-verbose --tries=1 --spider http://localhost:8222/varz || exit 1
    deploy:
      mode: replicated
      update_config:
        order: start-first
        failure_action: rollback
      placement:
        constraints:
          - node.role == manager
networks:
  public:
    external: true
avaidyam commented 2 years ago

@jeydude Could you also share system logs from the server?

jeydude commented 2 years ago

Initial log file after i changed to 2022, I will send the another log file in another 10 minutes _lamp_server.1.t2p9d6ih14azixkkgk29e6qdj_logs.txt

avaidyam commented 2 years ago

@jeydude Are you suggesting the data is not in the database after 10 minutes? If that's the case - the reason this behavior occurs is the server actually caches sensor data before writing to the database in bulk. You may need to wait longer or collect more sensor data first.

jeydude commented 2 years ago

@avaidyam, no I am not suggesting data is not in the database, as you know, iPhone sends sensor data every 10 minutes to the LAMP API server, so I have to wait for 10 minutes to see if there is any POST log, that's all.

I had the iPhone app up and running in foreground during this test: _lamp_server.1.t2p9d6ih14azixkkgk29e6qdj_logs (2).txt

I could not see any GPS/Accelerometer data in the API server or Couch DB. This happens only with 2022 image.

avaidyam commented 2 years ago

Can you check in Redis to see if the data is being cached? It won't show up in the database until the cache is full and is flushed from Redis to the database.

jeydude commented 2 years ago

How do I check that. Sorry I am very bad with Redis

avaidyam commented 2 years ago

Try this. You should be able to see increasing memory usage until the data is flushed, then the memory usage should drop.

jeydude commented 2 years ago

could not run the redis-cli, may be I need to work with unix admins

bash-4.4$ redis-cli info memory bash: redis-cli: command not found bash-4.4$ which redis-cli which: no redis-cli in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)

jeydude commented 2 years ago

Uploading log file from lamp_cache

_lamp_cache.1.j8to4gcqhyywq8n9k3sj00ieb_logs.txt

jeydude commented 2 years ago

memory and cache seems to be same level image

jeydude commented 2 years ago

No sensor data till now.

last data received was with 2021 image reference: image

I will check with unix admin for redis command to check the memory, if there is anything you need let me know, for now I am going to put it back to 2021 image.

jeydude commented 2 years ago

After switching back to 2021 image, sensor data is coming fine to couchdb, no issues.

Initial log file of 2021 image _lamp_server.1.7oifpvvyx88itwst0lrhaomqx_logs (1).txt image

after 10 minutes of 2021 image logs file _lamp_server.1.7oifpvvyx88itwst0lrhaomqx_logs.txt image

Anyone reported issues with sensor data collection using 2022 image?

avaidyam commented 2 years ago

@jeydude No, and there shouldn't be any difference as the 2022 server release doesn't actually change code for sensor data collection at all.

jeydude commented 2 years ago

@avaidyam, just wanted to confirm, is there any change with open api for 2022 server release?, how do you want to troubleshoot this issue, please let me know. I am happy to work with anyone to share the credentials to try from iphone device.

avaidyam commented 2 years ago

There should not be any OpenAPI change. We'll get back to you on next steps for resolving this issue shortly!

jeydude commented 2 years ago

Thanks a lot for your support!

jeydude commented 2 years ago

Redis cache information

bash-4.4$ docker exec -it lamp_cache.1.j8to4gcqhyywq8n9k3sj00ieb redis-cli
127.0.0.1:6379> info
# Server
redis_version:6.0.8
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:3f114dbfa16f9498
redis_mode:standalone
os:Linux 4.18.0-305.19.1.el8_4.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.3.0
process_id:1
run_id:5ff6110433e6b7d348f6f9e8096c35aa2ec14083
tcp_port:6379
uptime_in_seconds:82679
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:16364737
executable:/data/redis-server
config_file:
io_threads_active:0

# Clients
connected_clients:14
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:4
tracking_clients:0
clients_in_timeout_table:4

# Memory
used_memory:1259856
used_memory_human:1.20M
used_memory_rss:6422528
used_memory_rss_human:6.12M
used_memory_peak:2614488
used_memory_peak_human:2.49M
used_memory_peak_perc:48.19%
used_memory_overhead:1069708
used_memory_startup:802984
used_memory_dataset:190148
used_memory_dataset_perc:41.62%
allocator_allocated:1301648
allocator_active:1773568
allocator_resident:4481024
total_system_memory:8118042624
total_system_memory_human:7.56G
used_memory_lua:84992
used_memory_lua_human:83.00K
used_memory_scripts:27024
used_memory_scripts_human:26.39K
number_of_cached_scripts:10
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.36
allocator_frag_bytes:471920
allocator_rss_ratio:2.53
allocator_rss_bytes:2707456
rss_overhead_ratio:1.43
rss_overhead_bytes:1941504
mem_fragmentation_ratio:5.28
mem_fragmentation_bytes:5205184
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:237804
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:7
rdb_bgsave_in_progress:0
rdb_last_save_time:1643754630
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:479232
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:2863
total_commands_processed:630553
instantaneous_ops_per_sec:1
total_net_input_bytes:67628816
total_net_output_bytes:6496491
instantaneous_input_kbps:0.12
instantaneous_output_kbps:0.01
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:12123
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:1649
evicted_keys:0
keyspace_hits:51639
keyspace_misses:238552
pubsub_channels:4
pubsub_patterns:0
latest_fork_usec:255
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_reads_processed:265228
total_writes_processed:282254
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Replication
role:master
connected_slaves:0
master_replid:7016b865c0068ddee45818d3656d4b66174d64b3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:56.181833
used_cpu_user:52.005462
used_cpu_sys_children:0.391865
used_cpu_user_children:0.161734

# Modules

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=32,expires=3,avg_ttl=13148
127.0.0.1:6379>
avaidyam commented 2 years ago

@jeydude Do you know specifically which version of the LAMP-server you're using? Normally the 2021 label is an alias for something like 2021.3.14 or 2021.10.11 or so.

jeydude commented 2 years ago

Looks like it was created 9 months ago.

image

avaidyam commented 2 years ago

That seems to be an incredibly out of date version of the server. Let me investigate what could have changed. Just to confirm - are you also using the LAMP-worker component?

jeydude commented 2 years ago

No, we are not using LAMP-worker component. I have following containers LAMP Server, LAMP database (couchdb), LAMP Cache (Redis) LAMP message (Nats) deployed in our box.

avaidyam commented 2 years ago

Understood -- for now, please keep using that version of the server. I'll work with our team to identify the issue and update documentation. I'll keep you posted here!

jeydude commented 2 years ago

Thanks a lot!

jeydude commented 2 years ago

Hi Aditya, any update?

I tried to use

ghcr.io/bidmcdigitalpsychiatry/lamp-server:2021.9.13

In the log file, I have noticed Store Size is increasing, but it is not pushed to CouchDB yet. Do I need to wait for Store_Size to reach Max_Store_Size to see the data in CouchDB? Is there a way to reduce the Max_Store_Size?

2022-02-09T17:30:41.904897002Z POST /participant/U2932822766/sensor_event 200 - 14.185 ms

2022-02-09T17:30:41.904909817Z Store_Size 402

2022-02-09T17:30:41.904912991Z Max_Store_Size 50000

2022-02-09T17:30:41.904915691Z Inserting data to redis store

2022-02-09T17:30:42.029921895Z POST /participant/U2932822766/sensor_event 200 - 6.606 ms

2022-02-09T17:30:42.030403309Z Store_Size 412

2022-02-09T17:30:42.030415116Z Max_Store_Size 50000

2022-02-09T17:30:42.030418483Z Inserting data to redis store
avaidyam commented 2 years ago

@jeydude Are you able to see similar logs regarding Store_Size and Max_Store_Size in the :2022 release?

If not, this may be the issue - the store size (number of events to cache before flushing to database) should be configurable and it is currently not. I imagine store_size=0 disables it, and store_size=100 would set a small enough size that you would see real-time updates on your data tracker.

Do you think that covers your needs?

jeydude commented 2 years ago

@avaidyam

I tried the 2022 release and latest release, both does not show the Store_Size and Max_Store_Size in the logs. Looking at the source code, looks like you guys have removed those log entries in around 2022 release.

I am assuming I need to apply store_size=100 to environment section and once I applied it, I will let you know if I am getting sensor_events data or not. Thanks for your response.

    environment:
      HTTPS: 'off'
      SCHEDULER: 'on'
      ROOT_KEY: 'xxxxxxx'
      CDB: 'http://admin:xxxxxx@database:5984/'
      APP_GATEWAY: 'app-gateway.lamp.digital'
      PUSH_API_KEY: 'xxxxxxx'
      DASHBOARD_URL: 'dashboard.lamp.digital'
      REDIS_HOST: 'redis://cache:6379/0'
      NATS_SERVER: 'message_queue:4222'
      store_size:100
avaidyam commented 2 years ago

@jeydude Actually, Max_Store_Size is hardcoded here right now, and it's not possible to configure it. This is something we will add as a configuration variable, which should then resolve your issue.

@Linoy339 Can you look into this?

  1. Let's use CACHE_SIZE as the env variable name.
    1. CACHE_SIZE=0 should short-circuit (disable) the caching so all data is pushed straight to the database.
    2. CACHE_SIZE=500 should push the cache to database once 500 events are saved.
  2. Let's also add a CACHE_INTERVAL env variable to control a time-based (in seconds) flush mechanism.
    1. This should flush the cache to the database regardless of whether the Max_Store_Size is reached or not.
    2. CACHE_INTERVAL=0 would disable this, meaning flushing is always dependent on the CACHE_SIZE.
    3. CACHE_INTERVAL=5 would periodically check if Store_Size > 0 every 5 seconds, and if true, flush the cache to database.
Linoy339 commented 2 years ago

@avaidyam .Using both CACHE_SIZE and CACHE_INTERVAL can cause some complication. In the sense, it may results in some conflicts. If we use CACHE_SIZE and giving a 3 digit number (like 300), we can resolve the above issue. So, Can we limit this with the scope of CACHE_SIZE alone instead of using both CACHE_SIZE and CACHE_INTERVAL?

avaidyam commented 2 years ago

Sure, let's table the CACHE_INTERVAL plan for now.

Linoy339 commented 2 years ago

Thanks @avaidyam

Linoy339 commented 2 years ago

Thanks @avaidyam

Linoy339 commented 2 years ago

@avaidyam . We have implemented this. Please check: https://github.com/BIDMCDigitalPsychiatry/LAMP-server/blob/master/src/utils/queue/BulkDataWriteQueue.ts#L6

jeydude commented 2 years ago

Thanks @avaidyam and @Linoy339, updated my image to refer ghcr.io/bidmcdigitalpsychiatry/lamp-server:latest added CACHE_SIZE: 200 to the environment variable list.

I am getting sensor_events now, Thanks A LOT for your quick response and code changes!

image

avaidyam commented 2 years ago

@Linoy339 Thanks! It looks like this worked for @jeydude so I'll close the issue now. Once our release process is completed (today), please do swap back to the latest :2022 release, @jeydude.