akash-network / support

Akash Support and Issue Tracking
5 stars 3 forks source link

[provider] wrong ceph storage reporting #97

Closed andy108369 closed 7 months ago

andy108369 commented 1 year ago

It appears that provider is not correctly reporting the available persistent storage:

the size should not be negative

$ curl -sk https://provider.akash.pro:8443/status | jq
...
        "storage": [
          {
            "class": "beta3",
            "size": -275152609037
          }
        ]

akash 0.20.0 provider-services 0.2.1

I am running all-in-one provider deployment, ceph is configured to run 1 replicas only.

I have 763 GiB nvme disk, 493 GiB of storage is used and 270 GiB are available.

It looks like that the provider calculates the amount of free (available) persistent storage as allocatable-allocated instead of just using allocatable.

Details

$ curl -sk https://provider.akash.pro:8443/status  | jq
{
  "cluster": {
    "leases": 19,
    "inventory": {
      "active": [
        {
          "cpu": 1000,
          "memory": 536870912,
          "storage_ephemeral": 1073741824
        },
        {
          "cpu": 2000,
          "memory": 4294967296,
          "storage_ephemeral": 17179869184
        },
        {
          "cpu": 1000,
          "memory": 1073741824,
          "storage_ephemeral": 1073741824
        },
        {
          "cpu": 50,
          "memory": 67108864,
          "storage_ephemeral": 6291456
        },
        {
          "cpu": 4000,
          "memory": 4294967296,
          "storage_ephemeral": 4294967296
        },
        {
          "cpu": 100,
          "memory": 100663296,
          "storage_ephemeral": 6291456
        },
        {
          "cpu": 100,
          "memory": 536870912,
          "storage_ephemeral": 536870912
        },
        {
          "cpu": 500,
          "memory": 512000000,
          "storage_ephemeral": 512000000
        },
        {
          "cpu": 500,
          "memory": 536870912,
          "storage_ephemeral": 536870912
        },
        {
          "cpu": 50,
          "memory": 67108864,
          "storage_ephemeral": 6291456
        },
        {
          "cpu": 1000,
          "memory": 1073741824,
          "storage_ephemeral": 5368709120
        },
        {
          "cpu": 500,
          "memory": 1000000000,
          "storage_ephemeral": 25000000000
        },
        {
          "cpu": 200,
          "memory": 268435456,
          "storage_ephemeral": 134217728
        },
        {
          "cpu": 1000,
          "memory": 1073741824,
          "storage_ephemeral": 1073741824
        },
        {
          "cpu": 2000,
          "memory": 4294967296,
          "storage_ephemeral": 17179869184
        },
        {
          "cpu": 1000,
          "memory": 536870912,
          "storage_ephemeral": 536870912
        },
        {
          "cpu": 1000,
          "memory": 1000000000,
          "storage_ephemeral": 17179869184
        },
        {
          "cpu": 4000,
          "memory": 34359738368,
          "storage_ephemeral": 644769382400
        },
        {
          "cpu": 300,
          "memory": 805306368,
          "storage_ephemeral": 5100273664
        }
      ],
      "available": {
        "nodes": [
          {
            "cpu": 911,
            "memory": 3348954112,
            "storage_ephemeral": 373344501716
          }
        ],
        "storage": [
          {
            "class": "beta3",
            "size": -275305733901
          }
        ]
      }
    }
  },
  "bidengine": {
    "orders": 0
  },
  "manifest": {
    "deployments": 0
  },
  "cluster_public_hostname": "provider.akash.pro",
  "address": "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0"
}
# kubectl -n akash-services exec -i $(kubectl -n akash-services get pods -l app=akash-provider --output jsonpath='{.items[0].metadata.name}') -- curl -s http://akash-inventory-operator:8080/inventory | jq
Defaulted container "provider" out of: provider, init (init)
{
  "kind": "Inventory",
  "apiVersion": "akash.network/v2beta1",
  "metadata": {
    "creationTimestamp": "2023-05-02T15:07:28Z"
  },
  "spec": {
    "storage": [
      {
        "class": "akash-nodes",
        "allocatable": 250669219840,
        "allocated": 5709
      },
      {
        "class": "beta3",
        "allocatable": 250669219840,
        "allocated": 525822377741
      }
    ]
  },
  "status": {
    "state": "PULLED"
  }
}
allocatable 250669219840/1024^3 = 233.45390319824218750000
allocated 525822377741/1024^3 = 489.71025062818080186843

allocatable-allocated
250669219840-525822377741 = -275153157901

and in GiB:
-275153157901 /1024^3 = -256.25634742993861436843
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
  cluster:
    id:     d0e99a74-3127-4b99-91cc-500b701805ad
    health: HEALTH_WARN
            mon a is low on available space
            3 pool(s) have no replicas configured

  services:
    mon: 1 daemons, quorum a (age 5M)
    mgr: a(active, since 5M)
    osd: 1 osds: 1 up (since 5M), 1 in (since 8M)

  data:
    pools:   3 pools, 65 pgs
    objects: 135.18k objects, 510 GiB
    usage:   492 GiB used, 271 GiB / 763 GiB avail
    pgs:     65 active+clean

  io:
    client:   199 KiB/s rd, 38 MiB/s wr, 26 op/s rd, 106 op/s wr
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
nvme   763 GiB  270 GiB  493 GiB   493 GiB      64.58
TOTAL  763 GiB  270 GiB  493 GiB   493 GiB      64.58

--- POOLS ---
POOL               ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                1    1  1.8 MiB        2  1.8 MiB      0    232 GiB
akash-deployments   2   32  491 GiB  135.45k  491 GiB  67.90    232 GiB
akash-nodes         3   32  1.6 KiB        5  5.6 KiB      0    232 GiB
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd pool ls detail
pool 1 '.mgr' replicated size 1 min_size 1 crush_rule 5 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 1373 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 'akash-deployments' replicated size 1 min_size 1 crush_rule 4 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1238 lfor 0/1003/1001 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 3 'akash-nodes' replicated size 1 min_size 1 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1240 lfor 0/0/27 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd tree 
ID  CLASS  WEIGHT   TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         0.18629  root default                             
-3         0.18629      host node1                           
 0   nvme  0.18629          osd.0       up   1.00000  1.00000

# lsblk /dev/nvme1n1
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
nvme1n1                                                                                               259:6    0 953.9G  0 disk 
└─ceph--1ffbfa65--ae47--4e78--98fe--d92228d6e075-osd--block--91221482--282e--4ab2--a038--e5fc8243a5c4 253:0    0 763.1G  0 lvm  

# kubectl -n rook-ceph get pods
NAME                                              READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-4zlbk                            2/2     Running     0          163d
csi-cephfsplugin-provisioner-866bb9b49b-2dbc5     5/5     Running     0          163d
csi-rbdplugin-cvdd2                               2/2     Running     0          163d
csi-rbdplugin-provisioner-f6b7f64f7-2hqb8         5/5     Running     0          163d
rook-ceph-crashcollector-node1-5b5d94cc9d-7v27d   1/1     Running     0          163d
rook-ceph-mgr-a-7dccfb9df6-htt2j                  2/2     Running     0          163d
rook-ceph-mon-a-86b6855597-twlzd                  2/2     Running     0          163d
rook-ceph-operator-569bd48f94-2jkr7               1/1     Running     0          163d
rook-ceph-osd-0-6664d485ff-qhp89                  2/2     Running     0          163d
rook-ceph-osd-prepare-node1-bwwqr                 0/1     Completed   0          163d
rook-ceph-tools-78b5f9d9cf-h2gmm                  1/1     Running     0          163d
# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                      STORAGECLASS   REASON   AGE
pvc-1cb4fbc0-7109-418b-99e9-dd133ea3ba91   600Gi      RWO            Delete           Bound    tmsrflbhn6putlk2ch6dmoea0kug5801338sqkmhrckk4/node-data-node-0             beta3                   20h
pvc-250eddb7-f085-49eb-a00f-d96a6f26c49e   2Gi        RWO            Delete           Bound    ugpgr816c9o4gfdp54tk73rl2fvs544debnv98r7efvca/wordpress-data-wordpress-0   beta3                   2d12h
pvc-3908005e-6a4d-4b80-8b25-8e52f430e5bc   8Gi        RWO            Delete           Bound    n6fvo8d7eqld0023bip83chmungrfbcti9o09d9r5t970/ssh-data-ssh-0               beta3                   31d
andy108369 commented 1 year ago

The issue could be somewhere along these lines:

andy108369 commented 1 year ago

europlots provider looks good

OTOH, Shimpa's europlots provider seem to be reporting correct values, the difference is that he is using 2 replicas (vs mine 1) though:

$ curl -sk https://provider.europlots.com:8443/status | jq
...
        "storage": [
          {
            "class": "beta3",
            "size": 9275517457920
          }
        ]
9275517457920/1024^3 = 8638.49879980087280273437
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph df
--- RAW STORAGE ---
CLASS    SIZE   AVAIL     USED  RAW USED  %RAW USED
ssd    21 TiB  20 TiB  1.0 TiB   1.0 TiB       4.79
TOTAL  21 TiB  20 TiB  1.0 TiB   1.0 TiB       4.79

--- POOLS ---
POOL               ID  PGS   STORED  OBJECTS      USED  %USED  MAX AVAIL
.mgr                1    1  6.5 MiB        3    20 MiB      0    6.3 TiB
akash-nodes         2   32     19 B        1     8 KiB      0    9.4 TiB
akash-deployments   3  512  507 GiB  131.03k  1014 GiB   4.99    9.4 TiB
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
  cluster:
    id:     15d8da54-650e-4ec8-9914-5e0f35dda64c
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum f,g,h (age 7w)
    mgr: b(active, since 7w), standbys: a
    osd: 12 osds: 12 up (since 7w), 12 in (since 6M)

  data:
    pools:   3 pools, 545 pgs
    objects: 131.07k objects, 509 GiB
    usage:   1.0 TiB used, 20 TiB / 21 TiB avail
    pgs:     545 active+clean

  io:
    client:   2.0 MiB/s rd, 57 MiB/s wr, 71 op/s rd, 155 op/s wr
# kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 66 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 'akash-nodes' replicated size 2 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1669 lfor 0/0/38 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 3 'akash-deployments' replicated size 2 min_size 2 crush_rule 2 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode on last_change 1672 lfor 0/0/78 flags hashpspool,selfmanaged_snaps,bulk stripe_width 0 application rbd
# kubectl -n akash-services exec -i $(kubectl -n akash-services get pods -l app=akash-provider --output jsonpath='{.items[0].metadata.name}') -- curl -s http://akash-inventory-operator:8080/inventory | jq
Defaulted container "provider" out of: provider, init (init)
{
  "kind": "Inventory",
  "apiVersion": "akash.network/v1",
  "metadata": {
    "creationTimestamp": "2023-05-02T15:45:08Z"
  },
  "spec": {
    "storage": [
      {
        "class": "beta3",
        "allocatable": 10363471921152,
        "allocated": 1088648785547
      },
      {
        "class": "akash-nodes",
        "allocatable": 10363471921152,
        "allocated": 8192
      }
    ]
  },
  "status": {
    "state": "PULLED"
  }
}
allocatable 10363471921152/1024^3 = 9651.73535156250000000000

allocated 1088648785547/1024^3 = 1013.88318980764597654342

allocatable-allocated
(10363471921152-1088648785547)/1024^3 = 8637.85216175485402345657
# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                            STORAGECLASS   REASON   AGE
pvc-256acd72-1995-4328-af82-8fbfc8f7b61f   1Gi        RWO            Delete           Bound    rbg4sri93iu12s7rash30srfqnfqcm3c80rp3ikolmm0q/wordpress-data-wordpress-0         beta3                   9d
pvc-2b8c64da-6deb-4a68-b324-0fbd22a7f4ca   300Gi      RWO            Delete           Bound    m9iq5aip5act9e7aurqjkaud2j1gll44alu5b9borchjm/node-data-node-0                   beta3                   12d
pvc-37d9d8a7-1206-45a2-94eb-93c6b3746883   100Gi      RWO            Delete           Bound    e2m2rvciefas6n83rervlvusk81ec5vm99ted4fltq9u6/node-data-node-0                   beta3                   19h
pvc-43308a08-9f3c-4e5f-a582-f4d495422a0a   100Gi      RWO            Delete           Bound    5akscjn0mdkgui4fj912jar5n5vgrc563l686ocgdpom0/app-default-app-0                  beta3                   27d
pvc-55313446-992c-4200-bec4-4806f3492fa5   120Gi      RWO            Delete           Bound    1pk60puj07lc83h9e3a0m1b6dfjtrrsm2ph6j1p5lbrh2/mainnet-node-data-mainnet-node-0   beta3                   3d5h
pvc-611e6981-1793-4577-8fb1-7efd2ab15285   600Gi      RWO            Delete           Bound    0t8q0b6jibfnt8qs6daudm460n00qpvpf2ljimgt8usde/node-data-node-0                   beta3                   4h33m
pvc-6129547b-4740-4c5b-85bc-d7b5ac9bb166   20Gi       RWO            Delete           Bound    uq37fvmkuuil0lbdhfu6d50j89iqd3v2cnad6hvfva9ao/mongo-data-mongo-0                 beta3                   5d21h
pvc-7ab781c5-4374-48ae-947b-a8564e5d6a05   1Gi        RWO            Delete           Bound    29kqtvme23jb6jrb26g0b539pif15etga1asrhuea8no0/db-data-db-0                       beta3                   8h
pvc-83ec9de1-c350-4922-be96-e586a3096abe   10Gi       RWO            Delete           Bound    bffs15mhldb8gs59pvu7sr3d18gkpmhjvh362ur05kpd0/web-data-web-0                     beta3                   64d
pvc-86aa1918-ae32-48aa-a90f-e451b1ab08b6   1Gi        RWO            Delete           Bound    7e0de9j1brqajl2p3a522diolhidfinah94pussbmpong/fdm-data-fdm-0                     beta3                   9d
pvc-9c89d8d8-b8bd-4c69-b4f0-8f886b6929fe   10Gi       RWO            Delete           Bound    29kqtvme23jb6jrb26g0b539pif15etga1asrhuea8no0/wordpress-data-wordpress-0         beta3                   8h
pvc-b7f4386c-df21-428b-961d-60b3d34931f7   20Gi       RWO            Delete           Bound    lens-metrics/data-prometheus-0                                                   beta3                   199d
pvc-dbccd9aa-6cc9-4850-9fee-fc9b23438901   2Gi        RWO            Delete           Bound    6mn4dtj34m33g5i81bejnd8rf86gelkn71109au83ov32/wordpress-data-wordpress-0         beta3                   11d
pvc-e12a41c7-4a65-49b6-af2a-e905b507c244   2Gi        RWO            Delete           Bound    iub8ocsh301bavlqnkanoflkcn81ap4s442pc0lfotbng/wordpress-data-wordpress-0         beta3                   13d
pvc-ed9215ab-f7f1-4890-b566-45ce8d00ceae   2Gi        RWO            Delete           Bound    6hdnh7ouu0u914hiaclggskvta4mdc88c5cfbdse8tlum/wordpress-data-wordpress-0         beta3                   32d
andy108369 commented 1 year ago

Looks like the issue just got resolved after the 600Gi PV deployment got closed:

# kubectl get pv 
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                      STORAGECLASS   REASON   AGE
pvc-250eddb7-f085-49eb-a00f-d96a6f26c49e   2Gi        RWO            Delete           Bound    ugpgr816c9o4gfdp54tk73rl2fvs544debnv98r7efvca/wordpress-data-wordpress-0   beta3                   2d13h
pvc-3908005e-6a4d-4b80-8b25-8e52f430e5bc   8Gi        RWO            Delete           Bound    n6fvo8d7eqld0023bip83chmungrfbcti9o09d9r5t970/ssh-data-ssh-0               beta3                   31d
$ curl -sk https://provider.akash.pro:8443/status | jq
...
      "available": {
        "nodes": [
          {
            "cpu": 345,
            "memory": 51667336192,
            "storage_ephemeral": 413597237204
          }
        ],
        "storage": [
          {
            "class": "beta3",
            "size": 775974011123
          }
        ]
      }
775974011123/1024^3 = 722.68211387377232313156 GiB
andy108369 commented 11 months ago

I've checked the providers again, everything looks good:

$ kubectl get pv 
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                                STORAGECLASS   REASON   AGE
pvc-00728bd6-2232-476d-beeb-ee62c765d821   500Gi      RWO            Delete           Bound    3itt31libka00e2cvdb0qir9sjmg46c9jqb6ih3bngj5e/validator-data-validator-0             beta3                   82d
pvc-2fce2b24-6a57-424f-8f3a-dc64ae46399d   150Gi      RWO            Delete           Bound    dpq4c625tet4g13m6rtcerd71ggqg0tovnmt5ticlh1ng/akash-1-data-akash-1-0                 beta3                   6d17h
pvc-3e1b8f89-edff-4c58-b17d-d0ccc61e4ad9   1Gi        RWO            Delete           Bound    ptf5b386b0q6tbol80ts3pcls3rvcmbnlpi9bf69i4dm4/db-wordpress-db-db-0                   beta3                   3h3m
pvc-60ee0077-839e-4436-a8af-2df5ff38b25e   2Gi        RWO            Delete           Bound    d7m0kect8f25dnjv3bvuau64vmgjcmtgcut971dgqkg2m/wordpress-wordpress-data-wordpress-0   beta3                   3d
pvc-924eda07-734f-4ee4-a997-fa1c43120619   2Gi        RWO            Delete           Bound    ptf5b386b0q6tbol80ts3pcls3rvcmbnlpi9bf69i4dm4/wordpress-wordpress-data-wordpress-0   beta3                   3h3m
pvc-941d051f-982d-42a2-a759-81226ffbee8d   2Gi        RWO            Delete           Bound    9ns59l9qaf38d89rmtbjornuc9fjmdo3bh9f13dagovso/db-wordpress-db-db-0                   beta3                   5d3h
pvc-9d9ad46d-e301-472e-a7d6-558cae8599b3   32Gi       RWO            Delete           Bound    5pp1dda7sft7knov1q3tnq7ftst7kk9ufh4b4q560pktg/rpc2-root-vol-rpc2-0                   beta3                   7d22h
pvc-a1bee6fd-9dc4-4be4-a1ed-fb1fbd5b0576   1Gi        RWO            Delete           Bound    1a8jj7k478bekhrvv0h7kfi8fgvam6o6hmvlbmkkjov2q/wordpress-data-wordpress-0             beta3                   20d
pvc-bb0c2437-e96b-45c4-90fe-dc4445937825   1Gi        RWO            Delete           Bound    d7m0kect8f25dnjv3bvuau64vmgjcmtgcut971dgqkg2m/db-wordpress-db-db-0                   beta3                   3d
pvc-c5ce9c27-1b77-44ec-b66b-0ed0f9294085   10Gi       RWO            Delete           Bound    9ns59l9qaf38d89rmtbjornuc9fjmdo3bh9f13dagovso/wordpress-wordpress-data-wordpress-0   beta3                   5d3h

10+1+1+32+2+2+2+1+150+500 = 701 GB

$ curl -sk https://provider.hurricane.akash.pub:8443/status | jq -r '
[
  ["type", "cpu", "ram", "ephemeral", "persistent"],
  (
    ["used"] +
    (
      .cluster.inventory.active |
      [
        ( [.[].cpu|tonumber] | add / 1000 ),
        ( [.[].memory|tonumber] | add / pow(1024;3) ),
        ( [.[].storage_ephemeral|tonumber] | add / pow(1024;3) ),
        ( [.[].storage?.beta1 // .[].storage?.beta2 // .[].storage?.beta3 // 0 | tonumber] | add / pow(1024;3) )
      ]
    )
  ),
  (
    ["available"] +
    (
      [
        ([.cluster.inventory.available.nodes[]?.cpu // empty] | add / 1000),
        ([.cluster.inventory.available.nodes[]?.memory // empty] | add / pow(1024;3)),
        ([.cluster.inventory.available.nodes[]?.storage_ephemeral // empty] | add / pow(1024;3)),
        ([.cluster.inventory.available.storage[]? | select(.class | test("beta[1-3]"))] | if length == 0 then 0 else ([.[].size] | add / pow(1024;3)) end)
      ]
    )
  ),
  (
    .cluster.inventory.available.nodes[] | 
    (
      ["node", .cpu / 1000, .memory / pow(1024;3), .storage_ephemeral / pow(1024;3), "N/A"]
    )
  )
] | .[] | @tsv' | column -t

type       cpu     ram                 ephemeral           persistent
used       87.75   111.75323390960693  1040.061779499054   6
available  12.775  91.7803602218628    475.90599865466356  1822.8505535935983
node       12.775  91.7803602218628    475.90599865466356  N/A
$ kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
  cluster:
    id:     89a221a1-84bf-4cf7-ad8d-da70a32bfa37
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum a (age 7w)
    mgr: a(active, since 7w)
    osd: 4 osds: 4 up (since 7w), 4 in (since 11w)

  data:
    pools:   3 pools, 289 pgs
    objects: 145.70k objects, 568 GiB
    usage:   1.1 TiB used, 6.2 TiB / 7.3 TiB avail
    pgs:     289 active+clean

  io:
    client:   274 KiB/s rd, 20 MiB/s wr, 33 op/s rd, 169 op/s wr
$ kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    7.3 TiB  6.2 TiB  1.1 TiB   1.1 TiB      15.26
TOTAL  7.3 TiB  6.2 TiB  1.1 TiB   1.1 TiB      15.26

--- POOLS ---
POOL               ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                1    1  449 KiB        2  1.3 MiB      0    1.9 TiB
akash-nodes         2   32     19 B        1    8 KiB      0    2.9 TiB
akash-deployments   3  256  566 GiB  145.70k  1.1 TiB  16.08    2.9 TiB

Not sure what do we really need akash-nodes pool for?

Akash deployments are normally using only akash-deployments pool.

andy108369 commented 11 months ago

The reporting seem to have self-resolved. I'll reopen this issue if I see any issues reoccur.

Have moved the akash-nodes ceph pool related question to a separate issue => https://github.com/akash-network/support/issues/110

andy108369 commented 7 months ago

Re-opening as this issue needs additional testing.

I've noticed the used isn't getting subtracted from available amount of persistent storage once deployment shifts from pending to used (tested with provider 0.4.6, 0.4.7, 0.4.8-rc0).

It appears that the available is only decreasing as soon as one starts writing the actual data there.

andy108369 commented 7 months ago

Followed by https://github.com/akash-network/support/issues/146