OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
126 stars 156 forks source link

Sporadic 403 errors when fetching Characterization results #2336

Open alondhe opened 5 months ago

alondhe commented 5 months ago

Expected behavior

(Using WebAPI 2.14.0 / Atlas 2.14.1)

Generated Characterization results can be returned when clicking on them.

Actual behavior

Sporadically, we see these 403 errors when trying to view the results of a Characterization: image

This then becomes a problem for all users. The only workaround seems to be to simply create a new one, generate it, then view it. Somehow then this issue clears for all users.

Steps to reproduce behavior

  1. Create and generate a characterization
  2. Try to view the results, perhaps after a few days?

Tagging @konstjar

chrisknoll commented 5 months ago

403s imply an authorization failure, and I'm not sure how you're managing your permissions in your instance. Is it possible that permissions are being deleted somehow (ie: in a batch process that synchronizes permissions from LDAP?)

alondhe commented 5 months ago

We've changed the AD sync to be less frequent and the session timeout is a bit longer. But I can't see why it would randomly stop authorizing the route when all other routes are authorizing just fine.

konstjar commented 5 months ago

@alondhe I'm thinking. Could it be the same case as with cohort, when OMOP database is not available? If WebAPI fails to connect to results schema to retrieve the data, there might be a chance that you get 403.

alondhe commented 5 months ago

Seems to be in the loadCharacterizationExecution() function in Atlas, which calls /WebAPI/cohort-characterization/generation/{generationId} endpoint (the Java function getGeneration() in CcController.java in WebAPI).

I can replicate this outside of Atlas, using an R script. What's strange is that all other GET operations in CcController work fine.

alondhe commented 5 months ago

I've also tried making myself Moderator, which in theory should provide full permissions on all things. No luck.

alondhe commented 5 months ago

Okay, I think the answer was actually simple :)

These 2 permissions were somehow missing for the "Atlas Users" role: cohort-characterization:generation:*:get cohort-characterization:design:get

Restoring these appears to have fixed the issue. Closing this ticket.

chrisknoll commented 5 months ago

@alondhe , if you wouldn't mind taking a second look at this, as we've seen permissions disappear in our environment too and we're not sure if it is human error or something in the codebase: do you know if those permissions were previously assigned to that role in your environment?

alondhe commented 5 months ago

@alondhe , if you wouldn't mind taking a second look at this, as we've seen permissions disappear in our environment too and we're not sure if it is human error or something in the codebase: do you know if those permissions were previously assigned to that role in your environment?

So, I think our webapi schema may have had this issue for a while; its setup predates me joining my organization, and I do recall seeing 403s on characterizations over the past year. Additionally, I have a development instance in which these permissions have persisted for at least 6 months without disappearing, so I'm thinking it's not some bug in the codebase.

I think it may be good to add some additional logging to WebAPI that adds messages to when a request is unsuccessful due to a permission rule being disabled, or in this case, just missing. I didn't see anything in the WebAPI logs about this during my debugging, which made it harder to figure out. Although, to be fair, your comment above was spot on -- 403 means some permission is lacking. I just didn't think the granular permissions missing from Atlas Users role would be the issue.

alondhe commented 5 months ago

Ha! Okay, I take it back. These permissions DO disappear on AD sync, ours disappeared from webapi table "sec_permission" this morning.

select * from webapi.sec_permission
where value in ('cohort-characterization:generation:*:get', 'cohort-characterization:design:get');
-- returned 0 records
alondhe commented 5 months ago

Per @konstjar 's suggestion , I invoked an AD sync, but the permissions remained this time. So....still confused about what the issue is.

chrisknoll commented 5 months ago

Ok, glad you saw it. We're seeing the same issue with permissions just disappearing...I've thought about setting up an audit table on sec_permission to try to understand when these records are disappearing. Since it seems you narrowed it down (somewhat) think you can dig into it and get something reproducible? Maybe it only happens when the AD sync detects a diff that it needs to update the perms?

alondhe commented 5 months ago

Sure, we'll try to. It's been a few hours since I last invoked the AD sync, and the permissions didn't disappear.

I'm wondering if maybe there's some Postgres logging we can enable to see the query that causes it. Aside from that, I think @konstjar is examining the LDAP/AD sync codebase. I poked at it a bit, but I didn't see anything glaring about WebAPI disturbing those tables.