Closed karafecho closed 2 years ago
@karafecho So you want some validation on the inputs users provide? Right now, it doesn't look like we're doing that, and we'll return an empty (0) response because nothing was found (correctly). I think when a user provides a valid input but we have no results, we should return this response instead of an error. I think the right thing to do, albeit more complex, is to validate the inputs and error if the user gives us something invalid.
I agree with: "I think when a user provides a valid input but we have no results, we should return this response instead of an error. I think the right thing to do, albeit more complex, is to validate the inputs and error if the user gives us something invalid."
My concern is that if we return empty (0) responses to invalid input when the user doesn't know that it is invalid input with actual output, then the user will interpret this incorrectly and lose faith in the service.
In essence, I'm trying to improve the API in the absence of a UI. ;-)
@karafecho As currently implemented, the user will get a 200 valid response with return_value showing the reason why there is no results returned. For example, in this case, it shows Empty query result returned. Please try again
in the return value, similarly to what is implemented for other endpoints, e.g., Input features invalid or cohort ≤10 patients. Please try again
, Input cohort_id invalid. Please try again.
, etc. Do you think this is reasonable to users?
So, I'll provide an example to demonstrate my concern. Let's assume that a user wants to examine AlcoholUse by SexDILI. This user assumes that both variables are binary and enumerated as 0 and 1 (a reasonable assumption). So, the user enters AlcoholUse=0 (presumably no) and SexDILI=0 (presumably Male). The results appear to indicate that there are no males in the cohort.
But had the user entered SexDILI=Female, very different results would appear.
@karafecho That is because in the database, Male and Female strings are stored for SexDILI rather than 0 and 1. Our swagger example is shown below:
{
"feature": {
"SexDILI": {
"operator": "=",
"value": "Female"
}
},
"maximum_p_value": 1,
"correction": {
"method": "bonferroni"
}
}
So the user should know SexDILI takes Female or Male as value, right? I agree the response is confusing for this case for sure. Let me see if I can add a validation to catch this case, i.e., checking whether user's input exists in the database column, and if not, I think we should return a validation error rather than a empty response. @maximusunc What do you think?
Yes, we enumerated SexDILI as strings, but a user might reasonably assume otherwise. We are fairly consistent with our enumeration of variables, but we could probably do better moving forward.
In the current case, we could place the burden on the user, in terms of understanding the correct enumeration for a variable, but that seems less than ideal.
I would add the validation to make sure the user input is a string, not necessarily if the input exists in the database. Theoretically, we could have a database of just 'Female', but we shouldn't error if a user inputs 'Male'. If we're in charge of the openapi schema, there's a way of adding enums that would automatically be validated when a user hits that route. I'm not sure what we're currently doing though.
Agreed. I'll look into it to see what is the best way to add the validation. It is best to have openapi schema to do the validation, but in this case, the value is dependent on the variable or column in the database, so I don't think we can leverage openapi schema validation for this one, but let me know if you disagree @maximusunc
@maximusunc @karafecho Upon looking into this further, the value Kara show in the example is actually a string, "0" or "1", so checking whether it is a string type will not work here. I am wondering what kind of response is acceptable in this case? I am thinking an empty response with status code 200 in this case is acceptable? In the theoretical case @maximusunc mentioned above, if the user query value does not exist in the database even though it is actually a valid value, e.g., 'Male' while it happened not to be in the database, this empty response with status code 200 still makes sense, right? The only way I can think of to address this is to return an empty response if the query value is not in the database. Thoughts? @maximusunc @karafecho
I think in the end, we can only do so much for the user, i.e. validation/type checking, but if they give us bad input, i.e. a string that isn't in the database, we should return an empty response.
Okay, your points are well taken, @hyi and @maximusunc. I agree with returning an empty response with status code 200.
We can close this issue for now, but let's consider this issue and #212 moving forward.
@karafecho I am reopening this issue as well since I believe this PR #218 addresses it with my latest code commit. Once @maximusunc approves the PR, I will update asthma and DILI development instances for you to test. You can close this issue after you get a chance to test it and verify it works including empty value handling. Specifically, when you enter "1" in SexDILI for an equal operator, the response will have a return value of Invalid input value 1 for feature SexDILI. Please try again.
This should inform the user what is wrong with their input query.
Tested asthma and DILI dev instances, which appear to be working as planned. Issue can be closed.
This issue is to report that ICEES asthma and ICEES DILI return responses that appear to be valid but aren't truly valid when users enter incorrect enumeration parameters.