google-code-export / django-pyodbc

Automatically exported from code.google.com/p/django-pyodbc
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Queries of parent-child relationships where only parent data is retrieved ignores distinct #77

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

I found this problem by creating a django admin interface that had an inline 
model. I wanted to 
allow searches (as in search_fields) to run through fields in both the parent 
and the child tables 
and was getting multiple same results instead of the distinct results I 
expected.

What steps will reproduce the problem?

A simplified code example is easier to show what is going on. Models:

class Parent(models.Model):
    name = models.CharField(max_length=80)

    def __unicode__(self):
        return self.name

    class Meta:
        db_table = u'[dbo].[Parent]'

class Child(models.Model):
    name = models.CharField(max_length=80)
    parent = models.ForeignKey(Parent)

    def __unicode__(self):
        return "%s offspring of %s" % (self.name, self.parent)

    class Meta:
        db_table = u'[dbo].[child]'

Here is some data in json format for those models (in case you want to 
loaddata).

Now in the shell execute:

>>> models.Parent.objects.all()
Query executed: SELECT [dbo].[Parent].[id], [dbo].[Parent].[name], 
(ROW_NUMBER() OVER (ORDER 
BY [dbo].[Parent].[id] ASC)) AS [rn] FROM [dbo].[Parent]
[<Parent: John>, <Parent: Mary>, <Parent: Percy>]

>>> models.Parent.objects.filter(Q(name__icontains='John') | 
Q(child__name__icontains='John')).values('name')
[{'name': u'John'}, {'name': u'John'}, {'name': u'John'}, {'name': u'John'}, 
{'name': u'Mary'}]

Those are the names of the parents, and John is repeated in there because of 
the multiple 
children he has. That is OK, let's add a distinct!

>>> models.Parent.objects.filter(Q(name__icontains='John') | 
Q(child__name__icontains='John')).values('name').distinct()
[{'name': u'John'}, {'name': u'John'}, {'name': u'John'}, {'name': u'John'}, 
{'name': u'Mary'}]

The distinct had no effect...

The reason is the way row_number() is being used - it makes every row distinct. 
The query 
produced is:

SELECT DISTINCT [dbo].[Parent].[name] AS [name], (ROW_NUMBER() OVER (ORDER BY 
RAND() )) AS 
[rn] FROM [dbo].[Parent] LEFT OUTER JOIN [dbo].[child] ON ([dbo].[Parent].[id] 
= 
[dbo].[child].[parent_id]) WHERE (UPPER([dbo].[Parent].[name]) LIKE UPPER(%s) 
ESCAPE '\' COLLATE 
Latin1_General_CI_AS  OR UPPER([dbo].[child].[name]) LIKE UPPER(%s) ESCAPE '\' 
COLLATE 
Latin1_General_CI_AS )

Suggested solution - use a CTE on the query, something like:

with table_ ([name])
as (
SELECT DISTINCT [dbo].[Parent].[name] AS [name]
  FROM [dbo].[Parent] LEFT OUTER JOIN [dbo].[child] 
    ON ([dbo].[Parent].[id] = [dbo].[child].[parent_id]) 
 WHERE (UPPER([dbo].[Parent].[name]) LIKE UPPER('%john%') ESCAPE '\' COLLATE 
Latin1_General_CI_AS  
    OR UPPER([dbo].[child].[name]) LIKE UPPER('%john%') ESCAPE '\' COLLATE Latin1_General_CI_AS )
)
select * from (
 select table_.*, (ROW_NUMBER() OVER (ORDER BY RAND() )) AS [rn] from table_
) as X 

Pagination can be easily added to that query by doing appending to the query 
above:

where X.rn between 1 and 10

What is the expected output? What do you see instead?

Multiple repeated values despite using distinct.

What version of the product are you using? On what operating system?

Revision 177.

Please provide any additional information below.

This has a seriously weird effect in admin interfaces using inline models and 
search_fields that 
span multiple models.

I started patching sql_server.pyodbc.query.py to use the CTE above, but then 
realized that the 
actual addition of the ROW_NUMBER() is being done at 
sql_server.pyodbc.compiler.py. I'm unsure 
why we need row_number() added to every single query, so I decide to just 
report the bug and 
wait for instructions...

Thanks!

Original issue reported on code.google.com by ce.lo...@gmail.com on 11 Mar 2010 at 5:56

GoogleCodeExporter commented 9 years ago
Sorry, here are the json dumps of the data I used:

Parent -

[{"pk": 1, "model": "miscellaneous.parent", "fields": {"name": "John"}}, {"pk": 
2, "model": "miscellaneous.parent", 
"fields": {"name": "Mary"}}, {"pk": 3, "model": "miscellaneous.parent", 
"fields": {"name": "Percy"}}]

Child -

[{"pk": 1, "model": "miscellaneous.child", "fields": {"name": "John", "parent": 
1}}, {"pk": 2, "model": 
"miscellaneous.child", "fields": {"name": "Sonny", "parent": 1}}, {"pk": 3, 
"model": "miscellaneous.child", "fields": 
{"name": "Mary", "parent": 1}}, {"pk": 4, "model": "miscellaneous.child", 
"fields": {"name": "John Brian", "parent": 
1}}, {"pk": 5, "model": "miscellaneous.child", "fields": {"name": "mary 
elizabeth", "parent": 2}}, {"pk": 7, "model": 
"miscellaneous.child", "fields": {"name": "john", "parent": 2}}, {"pk": 8, 
"model": "miscellaneous.child", "fields": 
{"name": "Percyval", "parent": 3}}, {"pk": 9, "model": "miscellaneous.child", 
"fields": {"name": "Mary", "parent": 3}}, 
{"pk": 6, "model": "miscellaneous.child", "fields": {"name": "julie mary", 
"parent": 2}}]

Original comment by ce.lo...@gmail.com on 11 Mar 2010 at 6:01

GoogleCodeExporter commented 9 years ago
And here is an example of an admin interface where the problem would be 
materializing:

class ChildInline(admin.TabularInline):
    model = Child
    fk_name = 'parent'

class ParentAdmin(admin.ModelAdmin):
    search_fields = ('name', 'child__name',)
    list_per_page = 10
    list_select_related = True
    inlines = [ ChildInline, ]

Searching on the parent admin interface will produce the same unfortunate 
effect because the distinct is 
ignored.

Original comment by ce.lo...@gmail.com on 11 Mar 2010 at 6:08

GoogleCodeExporter commented 9 years ago
I wish I could edit the issue... :-)

Where I say "I'm unsure why we need row_number() added to every single query, 
so I decide to just report the 
bug and wait for instructions..."; I do realize the row_number() is in there to 
allow for pagination. 

The question is could the row_number addition be moved from compile.py to 
query.py? I'm unsure - so I 
stopped touching it.

Original comment by ce.lo...@gmail.com on 11 Mar 2010 at 6:42

GoogleCodeExporter commented 9 years ago
I've seen people suggest that 'query.py' isn't used in 1.2
My current (quick) fix is to move the row_number stuff outside the actual 
query, so 
a 'select *, (row_number...) from (<built query>) as Y' appears around the query

Original comment by julian.d...@gmail.com on 22 Mar 2010 at 11:11