charlie-haley / omada_exporter

Prometheus Exporter for TP-Link Omada Controller SDN.
MIT License
88 stars 14 forks source link

No client or device metrics with latest omada-controller version #29

Closed codersaur closed 2 years ago

codersaur commented 2 years ago

With the latest mbentley/omada-controller:latest the exporter is only returning controller metrics, no client or device metrics. I'm guessing the api must have changed again...?

charlie-haley commented 2 years ago

I honestly wouldn't be surprised, the API is a bit of a mess 😆

What version are you seeing the web interface and what version of the exporter are you running?

I'm currently having no issues on 5.0.30, but that's with the OC200 if it makes a difference.

charlie-haley commented 2 years ago

Okay I've just seen this issue on my end... I'll investigate and hopefully get a fix in ASAP

codersaur commented 2 years ago

The version is still reported as "5.0.30" in the web interface, but something has changed....

In the system logs I see the following:

03-11-2022 22:06:11.143 WARN [https-jsse-nio-8043-exec-257] [] c.t.s.o.a.d.i.PermissionCheckHandler(84): Current user does not has permission to access this site. path:/236041a....baaa6b9/api/v2/sites/Home/clients, method:GET 03-11-2022 22:06:41.136 WARN [https-jsse-nio-8043-exec-242] [] c.t.s.o.a.d.i.PermissionCheckHandler(84): Current user does not has permission to access this site. path:/236041a...aa6b9/api/v2/sites/Home/devices, method:GET

If I log in interactively in the web interface with the prometheus user I can see all the devices and clients, so the user does have the permissions in the web GUI.

My guess is that something is not right with the request headers when getting these endpoints.

Edit: this is v0.3.0 of the exporter. However, I just pulled 0.4.0 and it has the same issue.

charlie-haley commented 2 years ago

Okay, the issue on my end appeared to be because of a badly formed host, I've added a fix for that. (It had a trailing / on the end)

Could you check your user and see if Site Privileges is configured correctly, maybe try All if it's still not working

image

Let me know how you get on, i'll do some further digging if you're still having issues.

codersaur commented 2 years ago

Yep, I've checked the user privileges. I've tried setting the Site Privileges option to my specific site and also "All". I also tried changing the name of my default site to "Home", and yesterday I completely rebuilt my omada-controller instance (just restoring the settings), but none of these changes fixed the issue.

I have logged into the Web GUI with the user ("prometheus" in my case) and it can see all the client and device data as you would expect.

I've just tried your 0.4.1 release, but this hasn't resolved the issue. Let me know if there's any more info you need.

I think I will try rolling back the omada-controller image to confirm it is an issue with the latest image (so long as I can do it without losing stats).

phyber commented 2 years ago

I'm experiencing this issue too. After adding some extra debug lines and setting an appropriate log level, the JSON being returned by the devices endpoint is:

{"errorCode":-1005,"msg":"Operation forbidden."}

The IsLoggedIn() function believes that we're definitely logged in correctly, so it's not an incorrect username/password causing this, and the user I'm testing with is currently an administrator user (for purposes of this testing).

I'm not very familiar with the Omada API, so I don't have too much else to add at the moment.

phyber commented 2 years ago

After a little bit of looking at what API calls from my browser were doing, I've spotted what's happened here.

The OMADA_SITE argument must be the "Site ID" rather than it's name. In my case, I was attempting to use "Home". I changed this to its ID (a long hexadecimal value) and things sprung to life. There doesn't seem to be a way to find this ID from the Controller UI, so I just pulled it from one of my devices responses in the Firefox Developer Tools.

I guess the exporter needs some extra code to find this ID based on the site name.

codersaur commented 2 years ago

Yep, I can also confirm that if you obtain the site_id and put that into the OMADA_SITE variable then it works (however the metrics produced now use the site_id as the site tag, which is not ideal). Either the documentation needs to be updated, or preferably, the code should be updated so that site_name and site_id are handled appropriately (probably want both output as tags in the metrics).

charlie-haley commented 2 years ago

Sorry for the slow progress on this guys, life's been getting in the way!

Awesome that you've found a fix, the Omada API always seems to change under my feet constantly. I'll raise a PR today to implement a lookup for the Site ID.

Thanks for the investigation on this @phyber @codersaur

charlie-haley commented 2 years ago

The latest release should contain the fix, let me know how you get on as I only use a single site for Omada!

I'll re-open the issue if you're still having issues

codersaur commented 2 years ago

Doesn't appear to work for me with 0.4.2. I have renamed my default site to "Home". If I use OMADA_SITE=Home then it fails to resolve the site and I see the following errors in the Omada Controller log:

03-25-2022 20:15:37.727 WARN [https-jsse-nio-8043-exec-54] [] c.t.s.o.d.RestDispatcher(59): Unsupported path, path:/236041a4e7.....9/api/v2/sites//clients, method:GET
03-25-2022 20:15:37.722 WARN [https-jsse-nio-8043-exec-38] [] c.t.s.o.d.RestDispatcher(59): Unsupported path, path:/236041a4e7.....9/api/v2/sites//devices, method:GET

If I use OMADA_SITE=6228e0062d0b6a44271b456d then it still works.

FYI, here's the response from {{omada_base_url}}/api/v2/sites?currentPage=1&currentPageSize=10:

{
  "errorCode": 0,
  "msg": "Success.",
  "result": {
    "totalRows": 1,
    "currentPage": 0,
    "currentSize": 10,
    "data": [
      {
        "name": "Home",
        "id": "6228e0062d0b6a44271b456d",
        "region": "United Kingdom",
        "primary": true,
        "alertNum": 0,
        "wan": false,
        "lan": true,
        "lanDeviceConnectedNum": 2,
        "lanDeviceDisconnectedNum": 0,
        "wlan": true,
        "wlanDeviceConnectedNum": 1,
        "wlanDeviceDisconnectedNum": 0,
        "wlanDeviceIsolatedNum": 0,
        "lanUserNum": 20,
        "wlanUserNum": 11,
        "lanGuestNum": 0,
        "wlanGuestNum": 0
      }
    ]
  }
}
codersaur commented 2 years ago

Just looked at your code. I can see that the sites endpoint returns an "Operation forbidden." error as a Viewer and you are using the {{omada_base_url}}/api/v2/sites/<sitename>/setting/firewall/timeout endpoint instead. However, this also gives a "Operation forbidden." error on my system, which explains why it's failing to resolve the site_name to site_id.

Edit: I've done a bit of poking around and I can see that when logged in as my prometheus user with Viewer privledges, we can get the site ID from the {{omada_base_url}}/api/v2/users/current endpoint. Response looks like:

{
  "errorCode": 0,
  "msg": "Success.",
  "result": {
    "id": "xxxxxxxxxxxxxx",
    "type": 0,
    "roleType": 2,
    "name": "prometheus",
    "omadacId": "xxxxxxxxxxxxxxxx9",
    "adopt": false,
    "manage": false,
    "license": false,
    "alert": false,
    "privilege": {
      "sites": [
        {
          "name": "Home",
          "key": "6228e0062d0b6a44271b456d",
          "primary": true
        }
      ],
      "lastVisited": "6228e0062d0b6a44271b456d",
      "all": false
    },
    "disaster": 0,
    "forceModify": false
  }
}
charlie-haley commented 2 years ago

Just looked at your code. I can see that the sites endpoint returns an "Operation forbidden." error as a Viewer and you are using the "{{omada_base_url}}/api/v2/sites/home/setting/firewall/timeout" endpoint instead. However, this also gives a "Operation forbidden." error on my system, which explains why it's failing to resolve the site_name to site_id.

Hmm, I tested that endoint with my exporter user and it seemed to work. How is your user configured? Is Site Privileges set to All?

If that still doesn't work I'll try and dig for some alternative endpoints

codersaur commented 2 years ago

I think you'll want to use {{omada_base_url}}/api/v2/users/current endpoint, see my edit above. The response would appear to list all the sites that the logged in user has access to.

E.g. from the response get the result.privilege.sites array and search it for an entry with .name=sitename

charlie-haley commented 2 years ago

Release v0.4.3 is now fetching from the user endpoint

codersaur commented 2 years ago

Good work, that seems to have fixed it. Thanks.

One minor request would be to expose both site_name and site_id tags in the prometheus metrics.

charlie-haley commented 2 years ago

Good work, that seems to have fixed it. Thanks.

One minor request would be to expose both site_name and site_id tags in the prometheus metrics.

No problem, I'll add an issue to the repo