RL4H / G2P2C

Reinforcement Learning based Artificial Pancreas Systems for Controlling Blood Glucose in Type 1 Diabetes.
https://capsml.com/
MIT License
19 stars 11 forks source link

Some questions #4

Closed anbraten closed 8 months ago

anbraten commented 9 months ago

Hey, first of all thanks for all the work you have done here.

I am on the long way of trying to test how RL models as trained with simglucose could work on a phone in a similar fashion to how sth like AndroidAps is currently doing it.

The plan for a POC is to use:

There quite a few challenges to solve just for having a POC actually 😅

From reading the paper and this source-code I got a quite nice feeling of how you are doing most things, but there are quite a few questions left to me 🙈 (Hope you don't mind).

First of all I would be interested in what kind of machines you used to train and how long it took you round about? Have you maybe already tried to adjust your environment to use sth like ray? And did you maybe already had a look in running a model on a mobile device?

chirathyh commented 9 months ago

Hey, thank you very much for your questions. We are very happy to help. Your work is really interesting.

We are still actively working on this project. A few years ago, when we started, we structured the codebase as it is so that we have full flexibility over running experiments (not limited to designing RL algorithms, but also problem formulation including action-space, reward fn designs). However, using open-source frameworks, as you mentioned, would have benefits at this stage of the project. We haven't explored it yet. However, we are building a GPU-based simulator - https://github.com/chirathyh/GluCoEnv, again with a focus on improving our experiments. There are still many things (engineering/research) to figure out in a real-world setting (e.g., the effects of latency and sensor dropouts, we have kept our models small (5K parameters) for potential on-device training).

We have trained our models on a couple of HPC resources. However, an algorithm like PPO would take 8-10 hours on an RTX 3090 to train for 800,000 steps (in a single run, you can run 4 in-silico subjects x 3 seeds). If you are running only for a single person at a time, then you can optimise this code further to run parallel workers.

We are exploring cloud-based approaches to implement this in the real world (which has its limitations/risks) and haven't explored running on a mobile. Your findings and experience would be very valuable. Maybe inference on a mobile and training offline on the cloud maybe a nice approach.

If you are interested, I'm happy to share more details and our experience over an online meeting/further correspondence.
All the best with your work. Have a nice day.