apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.05k stars 1.1k forks source link

Usage charging deleted volumes #6717

Closed matheusfontes closed 2 years ago

matheusfontes commented 2 years ago
ISSUE TYPE
COMPONENT NAME
usage
CLOUDSTACK VERSION
4.16 and 4.17
CONFIGURATION

N/A

OS / ENVIRONMENT

centos 7

SUMMARY

Usage never set volume as deleted and when volume is resized, it is charge twice (or more)

STEPS TO REPRODUCE
Cenario 1:
Create a volume and attach it on vm.
Remove volume.
It will be charged forever

Cenario 2:
Create a volume and attach it on vm.
Resive volume.
It will be charged for many times that you resize it
EXPECTED RESULTS
Stop to charge volume when it is removed.
ACTUAL RESULTS
Volume is being charged when it is removed or being charged more than 24 hours.

# > list usagerecords domainid=8e1c4859-9cf4-4414-a218-160b05b9f157 accountid=3e64effb-12a6-4642-95f9-c217c804a3c4 type=6 startdate=2020-01-19 enddate=2020-01-19
    {
      "account": "21150",
      "accountid": "3e64effb-12a6-4642-95f9-c217c804a3c4",
      "description": "Volume usage for DATA-600 (40f8412b-65f5-4c2d-a6e3-00a4e0456279) with disk offering Disco Magnético (34573d38-2dce-4d8b-a0b4-80ad61b1cbc8) and size (1000.00 GB) 1073741824000",
      "domain": "skycloud.prv",
      "domainid": "8e1c4859-9cf4-4414-a218-160b05b9f157",
      "enddate": "2020-01-19'T'23:59:59-03:00",
      "offeringid": "34573d38-2dce-4d8b-a0b4-80ad61b1cbc8",
      "rawusage": "48",
      "size": 1073741824000,
      "startdate": "2020-01-19'T'00:00:00-03:00",
      "tags": [],
      "usage": "48 Hrs",
      "usageid": "40f8412b-65f5-4c2d-a6e3-00a4e0456279",
      "usagetype": 6,
      "zoneid": "74c526f8-52a2-4e2e-8d08-76eae35913a4"
    },
DaanHoogland commented 2 years ago

@matheusfontes , in your scenario 1:

Cenario 1:
Create a volume and attach it on vm.
Remove volume.
It will be charged forever

the volume is supposed to be continually charged for, if it is only removed from the VM. Do you mean it is deleted and expunged and than still charged for?

Scenario 2 seems obviously wrong, I'll look into that.

matheusfontes commented 2 years ago

@DaanHoogland volumes expunged are always beeing charged. I saw some queries in database that I think they are wrong. When usage job is started and it's looking for a VOLUME.DELETE event it tries to search this volume in usage_volume table with that query: SELECT usage_volume.id, usage_volume.zone_id, usage_volume.account_id, usage_volume.domain_id, usage_volume.volume_id, usage_volume.disk_offering_id, usage_volume.template_id, usage_volume.size, usage_volume.created, usage_volume.deleted FROM usage_volume WHERE usage_volume.account_id = 9 AND usage_volume.id = 110 AND usage_volume.deleted IS NULL

I think this usage_volume.id search criteria is wrong, it should search for a usage_volume.volume_id field.

matheusfontes commented 2 years ago

Apparently these 2 lines in UsageManagerImpl.java solves the usage volumes problem. But I think there is a problem with firewall rules and lb usage also. Ps: Tested on 4.17.0.1 version

UsageManagerImpl.txt

DaanHoogland commented 2 years ago

Can you submit a PR with those lines @matheusfontes ?

DaanHoogland commented 2 years ago

never mind @matheusfontes , found a few secs to do it: #6737

matheusfontes commented 2 years ago

@DaanHoogland now I have sure that the problem is bigger than only resized/deleted volumes. usage is charging a deleted vrouters networks transfer, deleted firewall rules and load balance. I need to open other issue? Everyone are experiencing these charges problems? The problem started on 4.16, so I think everyone that uses usage to billing clients are billing them wrong and this is need a urgent fix, don't it?

DaanHoogland commented 2 years ago

@matheusfontes I think that if these are fixed by separate changes they are seperate issues. If we piut the fix in one patch we can just rename the issue.

DaanHoogland commented 2 years ago

@matheusfontes did you test #6737 ? and do you know were to look for additional issues?

DaanHoogland commented 2 years ago

@matheusfontes I'm not a heavy user of the usage server or billing in general. I will need your help in closing this issue please. can you test and approve of #6737 and advice on any other issues there may be, please?

StepBee commented 9 months ago

@DaanHoogland i just stumbled across the same issue, still persisting in 4.18.1.0. Following the PR, this should have been merged in 4.18.0, correct?

Volumes, which have been destroyed/expunged are still reported with active usage records forever, we might need to have a second look at it.

DaanHoogland commented 9 months ago

yes it should, @StepBee . can you add more info? is this production or reproducible in a test/lab environment?

StepBee commented 9 months ago

@DaanHoogland sure, this is the usage output of one day for usagetype 6 (volumes) from a domain, using only 1 volume since a long time already:

(cloudstack) 🐱 > list usagerecords domainid=16644085-7eff-46dd-8eb0-ef27b0b1621e type=6 startdate=2024-01-08 enddate=2024-01-08
{
  "count": 4,
  "usagerecord": [
    {
      "account": "XXX_Admins",
      "accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
      "description": "Volume usage for ROOT-26854 (06ceea59-c8f8-4acd-902e-414d4bd60a5a) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "enddate": "2024-01-08'T'23:59:59+00:00",
      "rawusage": "24",
      "size": 53687091200,
      "startdate": "2024-01-08'T'00:00:00+00:00",
      "tags": [],
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "usage": "24 Hrs",
      "usageid": "06ceea59-c8f8-4acd-902e-414d4bd60a5a",
      "usagetype": 6,
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
    },
    {
      "account": "XXX_Admins",
      "accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
      "description": "Volume usage for ROOT-26840 (f9f7daa6-3d98-496b-8fb2-698fae3b0c96) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "enddate": "2024-01-08'T'23:59:59+00:00",
      "rawusage": "24",
      "size": 53687091200,
      "startdate": "2024-01-08'T'00:00:00+00:00",
      "tags": [],
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "usage": "24 Hrs",
      "usageid": "f9f7daa6-3d98-496b-8fb2-698fae3b0c96",
      "usagetype": 6,
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
    },
    {
      "account": "XXX_Admins",
      "accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
      "description": "Volume usage for ROOT-26842 (d8fb2834-f217-44dd-b5df-8a8573432668) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "enddate": "2024-01-08'T'23:59:59+00:00",
      "rawusage": "24",
      "size": 53687091200,
      "startdate": "2024-01-08'T'00:00:00+00:00",
      "tags": [],
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "usage": "24 Hrs",
      "usageid": "d8fb2834-f217-44dd-b5df-8a8573432668",
      "usagetype": 6,
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
    },
    {
      "account": "XXX_Admins",
      "accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
      "description": "Volume usage for ROOT-26856 (14265d5a-d8fb-4136-af73-8ad0945a10a8) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "enddate": "2024-01-08'T'23:59:59+00:00",
      "rawusage": "24",
      "size": 53687091200,
      "startdate": "2024-01-08'T'00:00:00+00:00",
      "tags": [],
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "usage": "24 Hrs",
      "usageid": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
      "usagetype": 6,
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
    }
  ]
}

In the usage output, usage for the following volumes are listed: Volume usage for ROOT-26854 Volume usage for ROOT-26840 Volume usage for ROOT-26842 Volume usage for ROOT-26856

The output of listing volumes from the same domain is listing only the last volume, ROOT-26856:

(cloudstack) 🐱 > list volumes domainid=16644085-7eff-46dd-8eb0-ef27b0b1621e listall=true
{
  "count": 1,
  "volume": [
    {
      "account": "XXX_Admins",
      "created": "2023-01-27T14:34:31+0000",
      "destroyed": false,
      "deviceid": 0,
      "diskioread": 17158,
      "diskiowrite": 4018086,
      "diskkbsread": 1198132,
      "diskkbswrite": 42926744,
      "displayvolume": true,
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "hasannotations": false,
      "hypervisor": "KVM",
      "id": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
      "isextractable": false,
      "name": "ROOT-26856",
      "path": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
      "physicalsize": 53687091200,
      "provisioningtype": "thin",
      "quiescevm": false,
      "serviceofferingdisplaytext": "Custom Instance",
      "serviceofferingid": "64",
      "serviceofferingname": "Custom",
      "size": 53687091200,
      "state": "Ready",
      "storage": "ceph-performance",
      "storageid": "d658f775-1c36-3f7f-afbe-c77991fec3f1",
      "storagetype": "shared",
      "supportsstoragesnapshot": false,
      "tags": [],
      "templatedisplaytext": "Ubuntu Server 22.04 LTS Cloud-Init",
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "templatename": "Ubuntu Server 22.04 LTS Cloud-Init",
      "type": "ROOT",
      "utilization": "100.0%",
      "virtualmachineid": "df71e1fd-e5c3-41b3-992e-20b9c067a4cd",
      "virtualsize": 53687091200,
      "vmdisplayname": "XXXXXX",
      "vmname": "XXXXXX",
      "vmstate": "Running",
      "vmtype": "User",
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc",
      "zonename": "Berlin-01"
    }
  ]
}

Looking at the database reveals the other three volumes, mentioned in the usage output, are already expunged since a very long time in the past and only one volume (the last one) is ready and should be part of the usage output:

id  name        uuid                    created     updated     removed     state       size
26888   ROOT-26840  f9f7daa6-3d98-496b-8fb2-698fae3b0c96    27.01.23 13:06  27.01.23 13:09  27.01.23 13:09  Expunged    53687091200
26890   ROOT-26842  d8fb2834-f217-44dd-b5df-8a8573432668    27.01.23 13:12  27.01.23 14:27  27.01.23 14:27  Expunged    53687091200
26902   ROOT-26854  06ceea59-c8f8-4acd-902e-414d4bd60a5a    27.01.23 14:30  27.01.23 14:33  27.01.23 14:33  Expunged    53687091200
26904   ROOT-26856  14265d5a-d8fb-4136-af73-8ad0945a10a8    27.01.23 14:34  19.12.23 12:49  \N      Ready       53687091200

This is from our production environment and i am able to reproduce it in our test environment.

DaanHoogland commented 9 months ago

This is from our production environment and i am able to reproduce it in our test environment.

please describe the reproduction scheme in a clean (new) environment and I'll try and fix it.

StepBee commented 9 months ago

I actually found a reference to what looks like the same issue on the cloudstack user mailinglist from 2018 https://lists.apache.org/thread/vb9v6ys0p0tr0wnzgt0oxdbjjxykbtk2

While trying to replicate the issue with new volumes, it's not as straight forward as i thought, like creating a datadisk, attaching it and deleting it to reproduce the issue - unfortunately it seems it's not that simple.

For some expunged volumes from the past i see eternal usage data, for some not - i'm trying to understand differences between both behaviors.

Maybe someone else with a long time running environment with activated usage service can pick a less-used domain (for better overview), generate a usage report for type 6 (volumes) and compare if the report includes expunged disks as well?

list usagerecords domainid=<domain-uuid> type=6 startdate=<startdate> enddate=<same-as-startdate> filter=description,rawusage
DaanHoogland commented 9 months ago

ok @StepBee , keep us u[pdated. cc @rajujith , didn't you deal with a similar thing recently? do you know how to reproduce?

StepBee commented 9 months ago

For the moment, i see for all affected volumes one field in the database is NULL, which is not the case for not affected volumes. Affected volumes:

cloud_usage.usage_volume.deleted = NULL

for not affected volumes:

cloud_usage.usage_volume.deleted = <date-of-deletion>

Following your question to the issue from 2018 and following the usage aggregation logic, for both types of volumes, affected and not affected, i see two events in the database table "cloud.usage_event"

The entries in the "cloud.usage_event" table are as expected - and the same applies to the copy in cloud_usage.usage_event table.

So it looks like something (sometimes) is missing the VOLUME.DELETE event during the first aggregation to the helper table "cloud_usage.usage_volume"

I'm trying to think of a quick-fix - is there a way to regenerate the usage data once i updated all the cloud_usage.usage_volume.deleted columns with the correct date? The API for "generateUsageRecords" will only generate records for previously failed generations, which is not the case here.