Parameterized query downloads empty CSV; same query in Playground is fine

ernstki commented 4 months ago

Hi all,

This issue doesn't appear to manifest in in 4.x, or 4.3.0 specifically. However, the 4.x series isn't quite ready for us yet…

Screenshot of the query parameters section of django-sql-explorer 4.3.0, showing the crowded form field layout

…in terms of how the query parameters are rendered, and that's fine. It's just CSS, and I'm sure that'll be resolved eventually. On top of that, though, we've spent a bit of time customizing the templates in such a way that it didn't carry over with 4.x, so I'm reluctant to upgrade and have to re-do all that.

That said… if this is a wontfix kind of an issue, I respect that.

I checked to see if this had come up in some other issue, or in a PR or discussion, no dice.

Issue description

One out of many of parameterized queries (below) generates an "empty" CSV containing only the headers, and JSON containing only the outer square brackets. However, the generated SQL, copied from the query log into a new Playground works fine; the CSV is populated with data as expected.

Here's the query that triggers the behavior, lightly sanitized.

-- doing a SELECT on a SELECT was the only way I could figure out how
-- to get the DATEDIFF columns to show up
SELECT eid AS id_link,
       short_name,
       project_name,
       project_lead,
       extra_details,
       note,
       updated_at,
       updated_days_ago,
       created_at,
       created_days_ago,
       submission_date,
       submitted_days_ago,
       received_date,
       received_days_ago,
       read1_file_path,
       read1_file_checksum,
       read1_file_header,
       read2_file_path,
       read2_file_checksum
FROM
  (SELECT e.id AS eid,
          shortname(e.id) AS short_name,
          e.extra_details AS extra_details,
          e.note AS note,
          et.type AS exp_type,
          p.name AS project_name,
          lm.name AS project_lead,
          e.updated_at AS updated_at,
          DATEDIFF(now(), e.updated_at) AS updated_days_ago,
          e.created_at AS created_at,
          DATEDIFF(now(), e.created_at) AS created_days_ago,
          e.submission_date AS submission_date,
          DATEDIFF(now(), e.submission_date) AS submitted_days_ago,
          e.received_date AS received_date,
          DATEDIFF(now(), e.received_date) AS received_days_ago,
          e.read1_file_path AS read1_file_path,
          e.read1_file_checksum AS read1_file_checksum,
          e.read1_file_header AS read1_file_header,
          e.read2_file_path AS read2_file_path,
          e.read2_file_checksum AS read2_file_checksum
   FROM server_experiment e,
        server_project p,
        server_experimenttype et,
        server_labmember lm   
   WHERE e.project_id = p.id
     AND e.experiment_type_id = et.id
     AND e.lead_id = lm.id) t
WHERE project_name LIKE "%$$pname|Proj. name$$%"
  AND project_lead LIKE "%$$lead|Proj. lead$$%"
  AND exp_type LIKE "%$$etype|Exp. type$$%"
  AND short_name LIKE "%$$shortname|Shortname$$%"
  AND extra_details LIKE "%$$details|Extra details$$%"
  AND note LIKE "%$$note|Note$$%"
  AND (
    read1_file_path LIKE "%$$filename|Filename (R1 or R2)$$%%"
    OR read2_file_path LIKE "%$$filename|Filename (R1 or R2)$$%%"
  )
  AND submission_date LIKE "$$submission_date|Submission date:%$$"
  AND received_date LIKE "$$received_date|Received date:%$$"
  AND (
    submitted_days_ago < "$$submitted_days_ago|Submitted within (days):10000$$"
    OR submitted_days_ago IS NULL
  )
  AND (
    received_days_ago < "$$received_days_ago|Received within (days):10000$$"
    OR received_days_ago IS NULL
  )
  AND (
    created_days_ago < "$$created_days_ago|Created within (days):10000$$"
    OR created_days_ago IS NULL
  )
  AND (
    updated_days_ago < "$$updated_days_ago|Updated within (days):10000$$"
    OR updated_days_ago IS NULL
  )
ORDER BY id_link ASC;

Workaround

Open the most recent query from the query logs in Playground, export the CSV from there:

Run the query as normal
Click "Logs" (explorer/logs, relative to your Django app)
Click "Open" in the "Playground" column for the most recent entry
Re-run the query and click Download → CSV

Asking for advice for troubleshooting

I'm not asking for SQL advice, just including the query in case it's insightful. Similar queries (excepting the SELECT from a SELECT part) generate CSVs just fine, and nothing like a Python traceback appears in the server logs.

More asking for advice about where to look to troubleshoot this.

We're invested enough in 3.2.x at this point that I may want to devote time to fixing the issue in that old branch—no promises, though—if you're willing to entertain patches to keep it on life support for a while longer.

ernstki commented 4 months ago

This happens somewhat-consistently on a second, nearly identical server. But it is an intermittent problem that sometimes appears to go away for a while when I reorder some columns in the query.

So, basically, no way to predict where the fault is without knowing where to turn up the debug level or watch the data flow. Are there any facilities for that, apart from just step-debugging in a local instance?

Does "empty CSV file except for the header row" sound like a common failure mode?

chrisclark commented 4 months ago

interesting -- let me try to replicate locally.

Out of curiosity, what is the barrier to going to 4.x? It should function identically -- but adds some nice stuff like autocompletion, etc. in the code editor (and, optionally, an AI-assistant thingy). Have you forked it and made a bunch of changes?

chrisclark commented 4 months ago

Sorry, I read the initial message more closely and see why you have not upgraded. If you have changes you think are valuable, I'd love to know what they are and could incorporate them potentially.

ernstki commented 4 months ago

Thanks, and no great urgency on this one. Mostly just wanted to have the workaround spelled out so I could point at it from local documentation.

Out of curiosity, what is the barrier to going to 4.x? It should function identically -- but adds some nice stuff like autocompletion, etc. in the code editor (and, optionally, an AI-assistant thingy). Have you forked it and made a bunch of changes?

Mostly just the squished input fields for parameterized queries (screenshot above), and the fact that we've already extended the templates for 3.x to link back to the parent Django application in ways that everyone's used to at this point.

Also, to be honest, because I can actually work on 3.x, but I'm not quite ready for Vite, Vue, and all the JavaScript tooling necessary to tinker with 4.x / 5.x. Some day!

ernstki commented 4 months ago

Since I've at least identified this as an intermittent fault, maybe I should try to trigger the behavior again with 4.x and the 5.x beta. Just in case it's something that's carried forward into the still-supported releases. I'll report back if I get a chance to do that.

chrisclark commented 4 months ago

Re: 4.x -- that makes sense. Although I spent a ton of time making sure it is super easy to work on (though it pains me too that it is now more intimidating). There is a start.sh script that sets it all up -- and the dev experience is actually quite a bit nicer thanks to live reloading. In fact, if you pull it down and run it, when you view the app -- it will detect that you don't have Vite running and give you instructions right in the browser to do it. You don't even have to read the docs! But yeah - I totally get it.

The JS in the templates ultimately just got way too complex and was hard to deal with.

Ok -- onto the issue:

Looking at the code, I don't see anything obvious that would end up returning just headers. It does seem weird...like you would expect the query to fail entirely; not return headers only.

I would recommend adding some additional logging in the explorer.models.QueryResult.__init__() method. It looks something like:

    def __init__(self, sql, connection):

        self.sql = sql
        self.connection = connection

        cursor, duration = self.execute_query()

        self._description = cursor.description or []
        self._data = [list(r) for r in cursor.fetchall()]
        self.duration = duration

        cursor.close()

        self._headers = self._get_headers()
        self._summary = {}

You could add some code in there to capture cases where self._description is populated (the headers) but then self._data ends up empty. Perhaps the cursor is being somehow reused and I should refactor this to a with block or something?

Lastly, even though you specifically did not ask for SQL advice -- I can't help myself. Refactoring that query to use a common table expression (which is super easy) will make it much easier to read :) And the AI assistant will happily do it for you if you ask!

chrisclark commented 4 months ago

Lastly here are some suggestions (towards the end) on how to add some additional monitoring to that function:

https://chatgpt.com/c/ece3d294-6657-422e-85d0-a69aad209d69

explorerhq / sql-explorer