Open drniiken opened 1 year ago
We have a very rough PoC python script that:
Prometheus is used as a datasource. Potentially, this can serve as a basis for a new osparc-service that collects and exposes resource usage.
To make it robust, we need:
The PoC script is found here https://github.com/ITISFoundation/osparc-simcore/pull/4168
Question from my side for the sprint planning PastelDeNata (I will be absent on PM2a):
Who is billed in case of unscheduled or scheduled maintenances? Imagine a solver job that is long running and killed due to an incident or scheduled downtime. How do we detect it inside our (to be devised) cost-system?
Is it truly necessary that we track additional metrics apart from simulation container seconds? It is at least conceivable to me to have a business model we charge the users a "flatrate" prices based only on simulation hours, that includes some additional charge for egress and S3 costs on our side as well. It would make the whole pipeline much smoother. I dont see value in devising a complicated system to track for example egress costs per user at this point (Personal opinion) . If a user needs to download a file for their scientific project, they are gonna do it anyway. A straight forward billing model might be easier to comprehend for end users. I doubt scientist want to even think about whether they cause egress or not. ---> I propose a flatrate charge based only on simulation seconds.
Notes & Discussions:
_ports_outputs_pull.316d0aa2-3570-48c6-a4c4-087076a44121 HTTP/1.1" 200
log_level=INFO | log_timestamp=2023-06-22 15:49:01,197 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56712 - "GET /task/simcore_service_dynamic_sidecar.modules.long_running_tasks.task_restore_state.c6fba4b4-d146-404b-b5b0-c6eba5ba411d HTTP/1.1" 200
log_level=INFO | log_timestamp=2023-06-22 15:49:01,212 | log_source=simcore_sdk.node_data.data_manager:_pull_file(136) | log_uid=None | log_msg=pulling data from 4517c942-1114-11ee-954c-02420a00044e/a796a9e0-2652-5eb4-8955-9219d77be36c/workspace.zip to /tmp/tmpxsws8st3/workspace.zip...
log_level=INFO | log_timestamp=2023-06-22 15:49:01,337 | log_source=simcore_service_dynamic_sidecar.modules.nodeports:download_target_ports(306) | log_uid=None | log_msg=Downloaded 0.0B in 0.1322210943326354 seconds
downloading /master-simcore/4517c942-1114-11ee-954c-02420a00044e/a796a9e0-2652-5eb4-8955-9219d77be36c/workspace.zip --> workspace.zip : 0%| | 0.00/12.6k [00:00<?, ?byte/s] downloading /master-simcore/4517c942-1114-11ee-954c-02420a00044e/a796a9e0-2652-5eb4-8955-9219d77be36c/workspace.zip --> workspace.zip : 100%|āāāāāāāāāā| 12.6k/12.6k [00:00<00:00, 2.88Mbyte/s] log_level=INFO | log_timestamp=2023-06-22 15:49:01,374 | log_source=simcore_sdk.node_data.data_manager:_pull_file(149) | log_uid=None | log_msg=completed pull of /tmp/tmpxsws8st3/workspace.zip.
decompressing /tmp/tmpxsws8st3/workspace.zip -> /dy-volumes/home/jovyan/work/workspace [3 files/12.635KiB] : 0%| | 0.00/12.6k [00:00<?, ?file/s] decompressing /tmp/tmpxsws8st3/workspace.zip:jl_notebook.ipynb -> /dy-volumes/home/jovyan/work/workspace/jl_notebook.ipynb : 0%| | 0.00/9.18k [00:00<?, ?byte/s] decompressing /tmp/tmpxsws8st3/workspace.zip:jl_notebook.ipynb -> /dy-volumes/home/jovyan/work/workspace/jl_notebook.ipynb : 100%|āāāāāāāāāā| 9.18k/9.18k [00:00<00:00, 40.6Mbyte/s]
decompressing /tmp/tmpxsws8st3/workspace.zip:.hidden_do_not_remove -> /dy-volumes/home/jovyan/work/workspace/.hidden_do_not_remove : 0%| | 0.00/227 [00:00<?, ?byte/s] decompressing /tmp/tmpxsws8st3/workspace.zip:.hidden_do_not_remove -> /dy-volumes/home/jovyan/work/workspace/.hidden_do_not_remove : 100%|āāāāāāāāāā| 227/227 [00:00<00:00, 1.66Mbyte/s]
decompressing /tmp/tmpxsws8st3/workspace.zip:README.ipynb -> /dy-volumes/home/jovyan/work/workspace/README.ipynb : 0%| | 0.00/2.89k [00:00<?, ?byte/s] decompressing /tmp/tmpxsws8st3/workspace.zip:README.ipynb -> /dy-volumes/home/jovyan/work/workspace/README.ipynb : 100%|āāāāāāāāāā| 2.89k/2.89k [00:00<00:00, 32.2Mbyte/s]
decompressing /tmp/tmpxsws8st3/workspace.zip -> /dy-volumes/home/jovyan/work/workspace [3 files/12.635KiB] : 100%|āāāāāāāāāā| 12.6k/12.6k [00:00<00:00, 1.39Mfile/s] log_level=INFO | log_timestamp=2023-06-22 15:49:02,202 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56722 - "GET /task/simcore_service_dynamic_sidecar.modules.long_running_tasks.task_ports_outputs_pull.316d0aa2-3570-48c6-a4c4-087076a44121 HTTP/1.1" 200 log_level=INFO | log_timestamp=2023-06-22 15:49:02,203 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56728 - "GET /task/simcore_service_dynamic_sidecar.modules.long_running_tasks.task_restore_state.c6fba4b4-d146-404b-b5b0-c6eba5ba411d HTTP/1.1" 200 log_level=INFO | log_timestamp=2023-06-22 15:49:02,210 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56730 - "GET /task/simcore_service_dynamic_sidecar.modules.long_running_tasks.task_ports_outputs_pull.316d0aa2-3570-48c6-a4c4-087076a44121/result HTTP/1.1" 200 log_level=INFO | log_timestamp=2023-06-22 15:49:02,211 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56742 - "GET /task/simcore_service_dynamic_sidecar.modules.long_running_tasks.task_restore_state.c6fba4b4-d146-404b-b5b0-c6eba5ba411d/result HTTP/1.1" 200 log_level=INFO | log_timestamp=2023-06-22 15:49:02,227 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56750 - "POST /v1/containers/ports/outputs/dirs HTTP/1.1" 204 log_level=INFO | log_timestamp=2023-06-22 15:49:02,258 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56762 - "POST /v1/containers HTTP/1.1" 202 log_level=INFO | log_timestamp=2023-06-22 15:49:02,263 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56764 - "GET /task/simcore_service_dynamic_sidecar.modules.long_running_tasks.task_create_service_containers.43c42531-85c4-467d-a863-89bb4f5b1fae HTTP/1.1" 200 log_level=INFO | log_timestamp=2023-06-22 15:49:03,266 | log_source=uvicorn.access:send(438) | log_uid=None | log_msg=172.13.0.4:56778 - "GET /task/simcore_service_dynamic_sidecar.modules.long_running_tasks.task_create_service_containers.43c42531-85c4-467d-a863-89bb4f5b1fae HTTP/1.1" 200 log_level=INFO | log_timestamp=2023-06-22 15:49:03,638 | log_source=simcore_service_dynamic_sidecar.modules.long_running_tasks:task_create_service_containers(139) | log_uid=None | log_msg=Validated compose-spec:\nnetworks:\n back----end:\n internal: false\n dy-sidecar_a796a9e0-2652-5eb4-8955-9219d77be36c:\n driver: overlay\n external:\n name: dy-sidecar_a796a9e0-2652-5eb4-8955-9219d77be36c\n master-simcore_interactive_services_subnet:\n driver: overlay\n external:\n name: master-simcore_interactive_services_subnet\nservices:\n dy-sidecar-a796a9e0-2652-5eb4-8955-9219d77be36c-0-jupyter-smash:\n container_name: dy-sidecar-a796a9e0-2652-5eb4-8955-9219d77be36c-0-jupyter-smash\n cpus: 4.0\n environment:\n - DISPLAY=:0\n - DY_SIDECAR_PATH_INPUTS=/home/jovyan/work/inputs\n - DY_SIDECAR_PATH_OUTPUTS=/home/jovyan/work/outputs\n - DY_SIDECAR_STATE_PATHS=["/home/jovyan/work/workspace"]\n - SIMCORE_NANO_CPUS_LIMIT=4000000000\n - SIMCORE_MEMORY_BYTES_LIMIT=17179869184\n - SIMCORE_NODE_BASEPATH=\n - SYM_SERVERHOSTNAME=sym-server%service_uuid%\n - DY_BOOT_OPTION_BOOT_MODE=0\n - PUID=8004\n - PGID=8004\n image: registry.osparc-master.speag.com/simcore/services/dynamic/jupyter-smash:3.0.7\n init: true\n labels:\n - product_name=osparc\n - simcore_user_agent=puppeteer\n - study_id=4517c942-1114-11ee-954c-02420a00044e\n - user_id=727\n - uuid=a796a9e0-2652-5eb4-8955-9219d77be36c\n mem_limit: '17179869184'\n mem_reservation: '536870912'\n networks:\n back----end: null\n dy-sidecar_a796a9e0-2652-5eb4-8955-9219d77be36c: null\n volumes:\n - /tmp/.X11-unix:/tmp/.X11-unix\n - /docker/volumes/dyv_25238755-5be8-4d92-b501-9c0750630d59_a796a9e0-2652-5eb4-8955-9219d77be36c_stupni_krow_nayvojemoh/_data:/home/jovyan/work/inputs\n - /docker/volumes/dyv_25238755-5be8-4d92-b501-9c0750630d59_a796a9e0-2652-5eb4-8955-9219d77be36c_stuptuo_krow_nayvojemoh/_data:/home/jovyan/work/outputs\n - /docker/volumes/dyv_25238755-5be8-4d92-b501-9c0750630d59_a796a9e0-2652-5eb4-8955-9219d77be36c_ecapskrow_krow_nayvojemoh/_data:/home/jovyan/work/workspace\nversion: '2.3'\n log_level=INFO | log_timestamp=2023-06-22 15:49:09,223 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/workspace/.hidden_do_not_remove': permissions=-rw-r--r-- uid=1000 gid=8004 size=227.0B\nFile stat: os.stat_result(st_mode=33188, st_ino=306149258, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=8004, st_size=227, st_atime=1687448940, st_mtime=1687448941, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,224 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/workspace/jl_notebook.ipynb': permissions=-rw-r--r-- uid=1000 gid=8004 size=9.2KiB\nFile stat: os.stat_result(st_mode=33188, st_ino=306149259, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=8004, st_size=9404, st_atime=1687448941, st_mtime=1687448941, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,224 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/workspace/README.ipynb': permissions=-rw-r--r-- uid=1000 gid=8004 size=2.9KiB\nFile stat: os.stat_result(st_mode=33188, st_ino=306149273, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=8004, st_size=2957, st_atime=1687448941, st_mtime=1687448941, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,224 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/workspace': permissions=drwxrwxr-- uid=1000 gid=998 size=80.0B\nFile stat: os.stat_result(st_mode=16892, st_ino=306220969, st_dev=2049, st_nlink=2, st_uid=1000, st_gid=998, st_size=80, st_atime=1687448949, st_mtime=1687448941, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,225 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/inputs/.hidden_do_not_remove': permissions=-rw-r--r-- uid=1000 gid=8004 size=228.0B\nFile stat: os.stat_result(st_mode=33188, st_ino=306149257, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=8004, st_size=228, st_atime=1687448940, st_mtime=1687448940, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,225 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/inputs': permissions=drwxrwxr-- uid=1000 gid=998 size=35.0B\nFile stat: os.stat_result(st_mode=16892, st_ino=306220968, st_dev=2049, st_nlink=2, st_uid=1000, st_gid=998, st_size=35, st_atime=1687448949, st_mtime=1687448940, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,226 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/outputs/.hidden_do_not_remove': permissions=-rw-r--r-- uid=1000 gid=8004 size=228.0B\nFile stat: os.stat_result(st_mode=33188, st_ino=2550292891, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=8004, st_size=228, st_atime=1687448940, st_mtime=1687448940, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,226 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/outputs/key_values.json': permissions=-rw-r--r-- uid=1000 gid=8004 size=192.0B\nFile stat: os.stat_result(st_mode=33188, st_ino=2550292892, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=8004, st_size=192, st_atime=1687448941, st_mtime=1687448941, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,226 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/outputs/output_1': permissions=drwxr-xr-x uid=1000 gid=8004 size=6.0B\nFile stat: os.stat_result(st_mode=16877, st_ino=3565018646, st_dev=2049, st_nlink=2, st_uid=1000, st_gid=8004, st_size=6, st_atime=1687448942, st_mtime=1687448942, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,226 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/outputs/output_2': permissions=drwxr-xr-x uid=1000 gid=8004 size=6.0B\nFile stat: os.stat_result(st_mode=16877, st_ino=283256553, st_dev=2049, st_nlink=2, st_uid=1000, st_gid=8004, st_size=6, st_atime=1687448942, st_mtime=1687448942, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,226 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/outputs/output_3': permissions=drwxr-xr-x uid=1000 gid=8004 size=6.0B\nFile stat: os.stat_result(st_mode=16877, st_ino=1429624200, st_dev=2049, st_nlink=2, st_uid=1000, st_gid=8004, st_size=6, st_atime=1687448942, st_mtime=1687448942, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,226 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/outputs/output_4': permissions=drwxr-xr-x uid=1000 gid=8004 size=6.0B\nFile stat: os.stat_result(st_mode=16877, st_ino=2550292893, st_dev=2049, st_nlink=2, st_uid=1000, st_gid=8004, st_size=6, st_atime=1687448942, st_mtime=1687448942, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,226 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/outputs': permissions=drwxrwxr-- uid=1000 gid=998 size=122.0B\nFile stat: os.stat_result(st_mode=16892, st_ino=2550448079, st_dev=2049, st_nlink=6, st_uid=1000, st_gid=998, st_size=122, st_atime=1687448949, st_mtime=1687448942, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,228 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/workspace/README.ipynb': permissions=-rw-rw-rw- uid=1000 gid=100 size=2.9KiB\nFile stat: os.stat_result(st_mode=33206, st_ino=306149253, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=100, st_size=2957, st_atime=1661260111, st_mtime=1661260111, st_ctime=1687448949) log_level=INFO | log_timestamp=2023-06-22 15:49:09,228 | log_source=simcore_service_dynamic_sidecar.modules.attribute_monitor._logging_event_handler:event_handler(34) | log_uid=None | log_msg=Attribute change to: '/dy-volumes/home/jovyan/work/workspace/README.ipynb': permissions=-rw-rw-rw- uid=1000 gid=100 size=2.9KiB\nFile stat: os.stat_result(st_mode=33206, st_ino=306149253, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=100, st_size=2957, st_atime=1661260111, st_mtime=1661260111, st_ctime=1687448949) log_level=WARNING | log_timestamp=2023-06-22 15:55:00,326 | log_source=servicelib.long_running_tasks._task:_stale_tasks_monitor_worker(119) | log_uid=None | log_msg=Removing stale task 'simcore_service_dynamic_sidecar.modules.long_running_tasks.task_create_service_containers.43c42531-85c4-467d-a863-89bb4f5b1fae' with status '{"task_progress": {"message": "finished", "percent": 1.0}, "done": true, "started": "2023-06-22T15:49:02.249855"}'
Next iteration:
Do this for comp. service iSolve and sim4life and jupyter-smash
IMPLEMENTATION NOTES:
PDF: _5e3d5799c2f3370c861d6889-Implementation Notes - Resource Usage Tracker-200723-065246.pdf
original (needed access): https://osparc.atlassian.net/wiki/spaces/~5e3d5799c2f3370c861d6889/pages/327681/Implementation+Notes+-+Resource+Usage+Tracker
Next iteration:
Do this for comp. service iSolve and sim4life and jupyter-smash
Done:
In progress:
To do:
Design:
Pricing plan DB design: pricing-plan-db-design.pdf
Brainstorming with @pcrespov
What happens when we hit zero credits?
sim4life (dynamic service)
run isolve with the API
docker pause
& docker unpause
)Accountant removes user from the wallet (unauthorized)
Batch approach:
Done:
In progress:
To do:
Done:
In progress/To do:
Description
To ensure transparency and accuracy in billing, it is essential to collect and keep track of all resource usage from users, including compute time for simulation hours and S3 storage per user as well as egress costs for download of files. This information must be synchronized with what the customer has paid for, and the data must be stored permanently to accommodate the pay-per-use model.
To make this data accessible to all relevant parties, it will be integrated into multiple systems, including the webpage, product, and API, as well as potentially the finance department for billing.