GMOD / Chado

the GMOD database schema
http://gmod.org/wiki/Chado
Artistic License 2.0
38 stars 25 forks source link

search_path problem with boxrange function #65

Open jogoodma opened 6 years ago

jogoodma commented 6 years ago

Summary Recent versions (9.3+) of PostgreSQL that have been patched for CVE-2018-1058 will throw the following error when a vacuum / analyze is attempted on the featureloc table.

Error

vacuumdb: vacuuming of database "prod" failed: ERROR:  function create_point(integer, integer) does not exist
LINE 1: SELECT box (create_point(0, $1), create_point($2,500000000))
                    ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
QUERY:  SELECT box (create_point(0, $1), create_point($2,500000000))
CONTEXT:  SQL function "boxrange" during inlining

Details This problem is caused by the binloc_boxrange and binloc_boxrange_src indices on the featureloc table. These indices are generated by the boxrange function defined in the default Chado schema. Due to changes made for CVE-2018-1058, when a function is used in an index the search_path within that function has the public schema removed, thus any other user defined functions fail to execute because they are not found in the search_path. In our case, calls to create_point from within the boxrange function fail.

Possible solutions

  1. Hard code the boxrange function to use the schema name e.g. public.create_point(0, $1).

Hard coding the public schema in a function name is not ideal and would break on any installs that did not run in the public schema.

  1. Use the SET SEARCH_PATH FROM CURRENT in the function definition of boxrange.
    SET search_path = public;
    CREATE OR REPLACE FUNCTION boxrange (bigint, bigint) RETURNS box AS
    'SELECT box (create_point(0, $1), create_point($2,500000000))'
    LANGUAGE 'sql' IMMUTABLE SET SEARCH_PATH FROM CURRENT;

    This is what the PostgreSQL devs currently recommend. We would need to somehow dynamically set the search_path during build/install time. Talk to @scottcain on how best to do that within the GMOD schema build/install process.

Other functions We should check for other functions that may be impacted by this change as well.

Additional resources

  1. https://wiki.postgresql.org/wiki/A_Guide_to_CVE-2018-1058:_Protect_Your_Search_Path
  2. https://www.postgresql.org/message-id/87muz3f0xt.fsf%40news-spur.riddles.org.uk
jogoodma commented 6 years ago

Setup branch issue-65 for implementing a patch.

spficklin commented 6 years ago

Unfortunately, hard coding the schema prefix to 'public' would break all Tripal sites, and hardcoding a search_path would as well.

jogoodma commented 6 years ago

@spficklin, take a look at c195214f and let me know what you think. I've added a query that sets the search_path based on the return value from current_schemas(). I believe that this should get around the hard coding issues. This resolved the issue on the FlyBase schema but I still need to test it further.

jogoodma commented 6 years ago

@scottcain Can you point me at some GBrowse code/queries that utilize these indices? I'd like to do some testing and performance checks.

spficklin commented 6 years ago

Thanks @jogoodma. Yes, I think that would work just fine. I think there are other functions that have this same issue, perhaps not with the analyze/vacuum. Unfortunately, I cannot remember off the top of my head what they are. With Tripal we keep Chado in a separate schema (usually named 'chado'), so we have hit on these problems in the past and we solved them by setting the search path in our own code, so this is much better.

Two specific examples where this is a problem is, first, the indexes on the featureloc table which uses the binloc_boxrange function. I think your fix may address these. The other example is the fill_cvtermpath functions which call other functions. Although, those functions are a bit broken at the moment because they don't properly handle loops that have recently been introduced into the GO ontology.

jogoodma commented 6 years ago

I believe that this issue is limited to situations where an index calls a user function which in turn calls another user function. When a vacuum is initiated the vacuum process removes the default public schema from the search_path of the outermost function, breaking the call to the inner function that lacks an explicit schema.

Functions that are called from user select statements should not be affected as the search_path (default or otherwise) will be preserved throughout the calls. Unless I'm missing something, I don't see fill_cvtermpath being used in an index or a trigger so I don't think it will be affected.

spficklin commented 6 years ago

Yes, you're right, those aren't used in an index. They just cause a very similar problem when Chado is not the public schema so your fix would help with those as well!

scottcain commented 5 years ago

@jogoodma sorry I missed the initial notification that you tagged me in this. Other people have certainly expressed problems with schema use in Chado, but good fixes that don't break backward compatibility are probably going to be difficult. I have a feeling that at some point in the near future, we should be talking about Chado 2.0 that fixes such things and introduces new things (what, you ask. That's for me to know...)

scottcain commented 5 years ago

I see that there is a pull request too--maybe I need to fix notifications from github for this repo :-) I'll tag this issue so we can discuss it in person next month.

jogoodma commented 5 years ago

I believe that this PR #66 is a good fix that is also backwards compatible. Do you have an example that demonstrates a corner case where this breaks?