CartoDB / cartodb-postgresql

PostgreSQL extension for CartoDB
BSD 3-Clause "New" or "Revised" License
111 stars 52 forks source link

Make CDB_Get_Foreign_Updated_At robust to missing CDB_TableMetadata #362

Closed rafatower closed 5 years ago

rafatower commented 5 years ago

This may happen with non-carto DB's, when checking the updated_at times and not finding the corresponding remote.cdb_tablemetadata imported from the foreign non-carto DB.

Instead of failing, return a NOW() timestampt, so that caching logic just assumes there may have been changes.

This makes it work today, and leaves open the possibility of adding the required carto metadata for homogeneous caching in the future.

With this I can use the Maps API against a non-carto foreign table, without needing any other patches.

dgaubert commented 5 years ago

Hey!

I was thinking about how to implement a better Cache-Control strategy in Windshaft-CartoDB and I realized when there are foreign tables involved in the layer's query I will need to know if one of the affected tables is a foreign table or not. In fine with the approach of returning a NOW() instead of failing but also I will need to know if the table is a remote table or a local table when we call to CDB_QueryTables_Updated_At method. If I'm not wrong, the following should do the trick:

$ git diff
diff --git a/scripts-available/CDB_ForeignTable.sql b/scripts-available/CDB_ForeignTable.sql
index 98d3c9b..8311f88 100644
--- a/scripts-available/CDB_ForeignTable.sql
+++ b/scripts-available/CDB_ForeignTable.sql
@@ -171,7 +171,7 @@ AS $$
       LEFT JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
       WHERE c.oid = query_tables_oid.reloid
     )
-    SELECT fqtn.dbname, fqtn.schema_name, fqtn.table_name,
+    SELECT fqtn.dbname, fqtn.schema_name, fqtn.table_name, fqtn.relkind,
       (CASE WHEN relkind = 'f' THEN @extschema@.CDB_Get_Foreign_Updated_At(reloid)
             ELSE (SELECT md.updated_at FROM @extschema@.CDB_TableMetadata md WHERE md.tabname = reloid)
       END) AS updated_at

Having such info I can decide what max_age directive in Cache-Control header is the best to apply depending on the case.

Thoughts?

dgaubert commented 5 years ago

And the following are the changes in Maps API:

$ git diff
diff --git a/lib/cartodb/api/middlewares/cache-control-header.js b/lib/cartodb/api/middlewares/cache-control-header.js
index d72ee0c..1d13e20 100644
--- a/lib/cartodb/api/middlewares/cache-control-header.js
+++ b/lib/cartodb/api/middlewares/cache-control-header.js
@@ -1,6 +1,7 @@
 'use strict';

 const ONE_YEAR_IN_SECONDS = 60 * 60 * 24 * 365;
+const FIVE_MINUTES_IN_SECONDS = 60 * 5;

 module.exports = function setCacheControlHeader ({ ttl = ONE_YEAR_IN_SECONDS, revalidate = false } = {}) {
     return function setCacheControlHeaderMiddleware (req, res, next) {
@@ -8,14 +9,27 @@ module.exports = function setCacheControlHeader ({ ttl = ONE_YEAR_IN_SECONDS, re
             return next();
         }

-        const directives = [ 'public', `max-age=${ttl}` ];
+        const { mapConfigProvider } = res.locals;

-        if (revalidate) {
-            directives.push('must-revalidate');
-        }
+        mapConfigProvider.getAffectedTables((err, affectedTables) => {
+            if (err) {
+                global.logger.warn('ERROR generating Cache Control Header:', err);
+                return next();
+            }
+
+            if (affectedTables && !affectedTables.tables.every(table => table.relkind !== 'f')) {
+                ttl = FIVE_MINUTES_IN_SECONDS;
+            }
+
+            const directives = [ 'public', `max-age=${ttl}` ];
+
+            if (revalidate) {
+                directives.push('must-revalidate');
+            }

-        res.set('Cache-Control', directives.join(','));
+            res.set('Cache-Control', directives.join(','));

-        next();
+            next();
+        });
     };
 };
Algunenano commented 5 years ago

I'd prefer returning a magic value instead of now() when the table isn't present and use that to signal a lower cache time.

That way it would work in the future if we decide to use cache and we can even enable it in the local tables so it's also used when they aren't cartodbfyied.

dgaubert commented 5 years ago

That way it would work in the future if we decide to use cache and we can even enable it in the local tables so it's also used when they aren't cartodbfyied.

You mean remote tables, don't you?. If so, I agree on not exposing the type of table so the day we can know when a remote table was updated we won't need to change anything in Windshaft-CartoDB.

Algunenano commented 5 years ago

You mean remote tables, don't you?. If so, I agree on not exposing the type of table so the day we can know when a remote table was updated we won't need to change anything in Windshaft-CartoDB.

Yes, but also local tables that aren't cartodbfyied. The current function returns NULL on those cases, so if you have a JOIN of a cartodbfied table and a non-cartodbfied one (no matter if local or remote) you just get the last update time of the first.

If instead of returning NOW() or NULL we returned a huge epoch into the future when a table isn't in the tablemetadata table (local or remote), we could detect that in Windshaft and set a low/5 minutes cache to avoid "my map is not refreshing" complains.

rafatower commented 5 years ago

Yes, but also local tables that aren't cartodbfyied. The current function returns NULL on those cases

then probably when I saw some Last-updated header with the epoch zero was because of a nasty conversion of NULL to a timestamp somewhere in the chain. Mind the interactions with varnish/cdn.

rafatower commented 5 years ago

cc/ @jgoizueta to coordinate release and deploys