Closed andy108369 closed 7 months ago
The issue could be somewhere along these lines:
OTOH, Shimpa's europlots provider seem to be reporting correct values, the difference is that he is using 2 replicas (vs mine 1) though:
$ curl -sk https://provider.europlots.com:8443/status | jq
...
"storage": [
{
"class": "beta3",
"size": 9275517457920
}
]
9275517457920/1024^3 = 8638.49879980087280273437
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 21 TiB 20 TiB 1.0 TiB 1.0 TiB 4.79
TOTAL 21 TiB 20 TiB 1.0 TiB 1.0 TiB 4.79
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 6.5 MiB 3 20 MiB 0 6.3 TiB
akash-nodes 2 32 19 B 1 8 KiB 0 9.4 TiB
akash-deployments 3 512 507 GiB 131.03k 1014 GiB 4.99 9.4 TiB
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
cluster:
id: 15d8da54-650e-4ec8-9914-5e0f35dda64c
health: HEALTH_OK
services:
mon: 3 daemons, quorum f,g,h (age 7w)
mgr: b(active, since 7w), standbys: a
osd: 12 osds: 12 up (since 7w), 12 in (since 6M)
data:
pools: 3 pools, 545 pgs
objects: 131.07k objects, 509 GiB
usage: 1.0 TiB used, 20 TiB / 21 TiB avail
pgs: 545 active+clean
io:
client: 2.0 MiB/s rd, 57 MiB/s wr, 71 op/s rd, 155 op/s wr
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 66 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 'akash-nodes' replicated size 2 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1669 lfor 0/0/38 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 3 'akash-deployments' replicated size 2 min_size 2 crush_rule 2 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode on last_change 1672 lfor 0/0/78 flags hashpspool,selfmanaged_snaps,bulk stripe_width 0 application rbd
# kubectl -n akash-services exec -i $(kubectl -n akash-services get pods -l app=akash-provider --output jsonpath='{.items[0].metadata.name}') -- curl -s http://akash-inventory-operator:8080/inventory | jq
Defaulted container "provider" out of: provider, init (init)
{
"kind": "Inventory",
"apiVersion": "akash.network/v1",
"metadata": {
"creationTimestamp": "2023-05-02T15:45:08Z"
},
"spec": {
"storage": [
{
"class": "beta3",
"allocatable": 10363471921152,
"allocated": 1088648785547
},
{
"class": "akash-nodes",
"allocatable": 10363471921152,
"allocated": 8192
}
]
},
"status": {
"state": "PULLED"
}
}
allocatable 10363471921152/1024^3 = 9651.73535156250000000000
allocated 1088648785547/1024^3 = 1013.88318980764597654342
allocatable-allocated
(10363471921152-1088648785547)/1024^3 = 8637.85216175485402345657
# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-256acd72-1995-4328-af82-8fbfc8f7b61f 1Gi RWO Delete Bound rbg4sri93iu12s7rash30srfqnfqcm3c80rp3ikolmm0q/wordpress-data-wordpress-0 beta3 9d
pvc-2b8c64da-6deb-4a68-b324-0fbd22a7f4ca 300Gi RWO Delete Bound m9iq5aip5act9e7aurqjkaud2j1gll44alu5b9borchjm/node-data-node-0 beta3 12d
pvc-37d9d8a7-1206-45a2-94eb-93c6b3746883 100Gi RWO Delete Bound e2m2rvciefas6n83rervlvusk81ec5vm99ted4fltq9u6/node-data-node-0 beta3 19h
pvc-43308a08-9f3c-4e5f-a582-f4d495422a0a 100Gi RWO Delete Bound 5akscjn0mdkgui4fj912jar5n5vgrc563l686ocgdpom0/app-default-app-0 beta3 27d
pvc-55313446-992c-4200-bec4-4806f3492fa5 120Gi RWO Delete Bound 1pk60puj07lc83h9e3a0m1b6dfjtrrsm2ph6j1p5lbrh2/mainnet-node-data-mainnet-node-0 beta3 3d5h
pvc-611e6981-1793-4577-8fb1-7efd2ab15285 600Gi RWO Delete Bound 0t8q0b6jibfnt8qs6daudm460n00qpvpf2ljimgt8usde/node-data-node-0 beta3 4h33m
pvc-6129547b-4740-4c5b-85bc-d7b5ac9bb166 20Gi RWO Delete Bound uq37fvmkuuil0lbdhfu6d50j89iqd3v2cnad6hvfva9ao/mongo-data-mongo-0 beta3 5d21h
pvc-7ab781c5-4374-48ae-947b-a8564e5d6a05 1Gi RWO Delete Bound 29kqtvme23jb6jrb26g0b539pif15etga1asrhuea8no0/db-data-db-0 beta3 8h
pvc-83ec9de1-c350-4922-be96-e586a3096abe 10Gi RWO Delete Bound bffs15mhldb8gs59pvu7sr3d18gkpmhjvh362ur05kpd0/web-data-web-0 beta3 64d
pvc-86aa1918-ae32-48aa-a90f-e451b1ab08b6 1Gi RWO Delete Bound 7e0de9j1brqajl2p3a522diolhidfinah94pussbmpong/fdm-data-fdm-0 beta3 9d
pvc-9c89d8d8-b8bd-4c69-b4f0-8f886b6929fe 10Gi RWO Delete Bound 29kqtvme23jb6jrb26g0b539pif15etga1asrhuea8no0/wordpress-data-wordpress-0 beta3 8h
pvc-b7f4386c-df21-428b-961d-60b3d34931f7 20Gi RWO Delete Bound lens-metrics/data-prometheus-0 beta3 199d
pvc-dbccd9aa-6cc9-4850-9fee-fc9b23438901 2Gi RWO Delete Bound 6mn4dtj34m33g5i81bejnd8rf86gelkn71109au83ov32/wordpress-data-wordpress-0 beta3 11d
pvc-e12a41c7-4a65-49b6-af2a-e905b507c244 2Gi RWO Delete Bound iub8ocsh301bavlqnkanoflkcn81ap4s442pc0lfotbng/wordpress-data-wordpress-0 beta3 13d
pvc-ed9215ab-f7f1-4890-b566-45ce8d00ceae 2Gi RWO Delete Bound 6hdnh7ouu0u914hiaclggskvta4mdc88c5cfbdse8tlum/wordpress-data-wordpress-0 beta3 32d
Looks like the issue just got resolved after the 600Gi
PV deployment got closed:
# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-250eddb7-f085-49eb-a00f-d96a6f26c49e 2Gi RWO Delete Bound ugpgr816c9o4gfdp54tk73rl2fvs544debnv98r7efvca/wordpress-data-wordpress-0 beta3 2d13h
pvc-3908005e-6a4d-4b80-8b25-8e52f430e5bc 8Gi RWO Delete Bound n6fvo8d7eqld0023bip83chmungrfbcti9o09d9r5t970/ssh-data-ssh-0 beta3 31d
$ curl -sk https://provider.akash.pro:8443/status | jq
...
"available": {
"nodes": [
{
"cpu": 345,
"memory": 51667336192,
"storage_ephemeral": 413597237204
}
],
"storage": [
{
"class": "beta3",
"size": 775974011123
}
]
}
775974011123/1024^3 = 722.68211387377232313156 GiB
I've checked the providers again, everything looks good:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-00728bd6-2232-476d-beeb-ee62c765d821 500Gi RWO Delete Bound 3itt31libka00e2cvdb0qir9sjmg46c9jqb6ih3bngj5e/validator-data-validator-0 beta3 82d
pvc-2fce2b24-6a57-424f-8f3a-dc64ae46399d 150Gi RWO Delete Bound dpq4c625tet4g13m6rtcerd71ggqg0tovnmt5ticlh1ng/akash-1-data-akash-1-0 beta3 6d17h
pvc-3e1b8f89-edff-4c58-b17d-d0ccc61e4ad9 1Gi RWO Delete Bound ptf5b386b0q6tbol80ts3pcls3rvcmbnlpi9bf69i4dm4/db-wordpress-db-db-0 beta3 3h3m
pvc-60ee0077-839e-4436-a8af-2df5ff38b25e 2Gi RWO Delete Bound d7m0kect8f25dnjv3bvuau64vmgjcmtgcut971dgqkg2m/wordpress-wordpress-data-wordpress-0 beta3 3d
pvc-924eda07-734f-4ee4-a997-fa1c43120619 2Gi RWO Delete Bound ptf5b386b0q6tbol80ts3pcls3rvcmbnlpi9bf69i4dm4/wordpress-wordpress-data-wordpress-0 beta3 3h3m
pvc-941d051f-982d-42a2-a759-81226ffbee8d 2Gi RWO Delete Bound 9ns59l9qaf38d89rmtbjornuc9fjmdo3bh9f13dagovso/db-wordpress-db-db-0 beta3 5d3h
pvc-9d9ad46d-e301-472e-a7d6-558cae8599b3 32Gi RWO Delete Bound 5pp1dda7sft7knov1q3tnq7ftst7kk9ufh4b4q560pktg/rpc2-root-vol-rpc2-0 beta3 7d22h
pvc-a1bee6fd-9dc4-4be4-a1ed-fb1fbd5b0576 1Gi RWO Delete Bound 1a8jj7k478bekhrvv0h7kfi8fgvam6o6hmvlbmkkjov2q/wordpress-data-wordpress-0 beta3 20d
pvc-bb0c2437-e96b-45c4-90fe-dc4445937825 1Gi RWO Delete Bound d7m0kect8f25dnjv3bvuau64vmgjcmtgcut971dgqkg2m/db-wordpress-db-db-0 beta3 3d
pvc-c5ce9c27-1b77-44ec-b66b-0ed0f9294085 10Gi RWO Delete Bound 9ns59l9qaf38d89rmtbjornuc9fjmdo3bh9f13dagovso/wordpress-wordpress-data-wordpress-0 beta3 5d3h
10+1+1+32+2+2+2+1+150+500 = 701 GB
$ curl -sk https://provider.hurricane.akash.pub:8443/status | jq -r '
[
["type", "cpu", "ram", "ephemeral", "persistent"],
(
["used"] +
(
.cluster.inventory.active |
[
( [.[].cpu|tonumber] | add / 1000 ),
( [.[].memory|tonumber] | add / pow(1024;3) ),
( [.[].storage_ephemeral|tonumber] | add / pow(1024;3) ),
( [.[].storage?.beta1 // .[].storage?.beta2 // .[].storage?.beta3 // 0 | tonumber] | add / pow(1024;3) )
]
)
),
(
["available"] +
(
[
([.cluster.inventory.available.nodes[]?.cpu // empty] | add / 1000),
([.cluster.inventory.available.nodes[]?.memory // empty] | add / pow(1024;3)),
([.cluster.inventory.available.nodes[]?.storage_ephemeral // empty] | add / pow(1024;3)),
([.cluster.inventory.available.storage[]? | select(.class | test("beta[1-3]"))] | if length == 0 then 0 else ([.[].size] | add / pow(1024;3)) end)
]
)
),
(
.cluster.inventory.available.nodes[] |
(
["node", .cpu / 1000, .memory / pow(1024;3), .storage_ephemeral / pow(1024;3), "N/A"]
)
)
] | .[] | @tsv' | column -t
type cpu ram ephemeral persistent
used 87.75 111.75323390960693 1040.061779499054 6
available 12.775 91.7803602218628 475.90599865466356 1822.8505535935983
node 12.775 91.7803602218628 475.90599865466356 N/A
$ kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
cluster:
id: 89a221a1-84bf-4cf7-ad8d-da70a32bfa37
health: HEALTH_OK
services:
mon: 1 daemons, quorum a (age 7w)
mgr: a(active, since 7w)
osd: 4 osds: 4 up (since 7w), 4 in (since 11w)
data:
pools: 3 pools, 289 pgs
objects: 145.70k objects, 568 GiB
usage: 1.1 TiB used, 6.2 TiB / 7.3 TiB avail
pgs: 289 active+clean
io:
client: 274 KiB/s rd, 20 MiB/s wr, 33 op/s rd, 169 op/s wr
$ kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 7.3 TiB 6.2 TiB 1.1 TiB 1.1 TiB 15.26
TOTAL 7.3 TiB 6.2 TiB 1.1 TiB 1.1 TiB 15.26
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 449 KiB 2 1.3 MiB 0 1.9 TiB
akash-nodes 2 32 19 B 1 8 KiB 0 2.9 TiB
akash-deployments 3 256 566 GiB 145.70k 1.1 TiB 16.08 2.9 TiB
Not sure what do we really need akash-nodes
pool for?
Akash deployments are normally using only akash-deployments
pool.
The reporting seem to have self-resolved. I'll reopen this issue if I see any issues reoccur.
Have moved the akash-nodes
ceph pool related question to a separate issue => https://github.com/akash-network/support/issues/110
Re-opening as this issue needs additional testing.
I've noticed the used
isn't getting subtracted from available
amount of persistent storage once deployment shifts from pending
to used
(tested with provider 0.4.6, 0.4.7, 0.4.8-rc0).
It appears that the available
is only decreasing as soon as one starts writing the actual data there.
It appears that
provider
is not correctly reporting the available persistent storage:akash 0.20.0 provider-services 0.2.1
I am running all-in-one provider deployment, ceph is configured to run 1 replicas only.
I have 763 GiB nvme disk, 493 GiB of storage is used and 270 GiB are available.
It looks like that the
provider
calculates the amount of free (available) persistent storage asallocatable-allocated
instead of just usingallocatable
.Details
akash-inventory-operator
report