fedora-infra / fedora-packages

A webapp that allows searching packages in Fedora. Written in Python using TurboGears2 and Moksha.
https://apps.fedoraproject.org/packages
GNU Affero General Public License v3.0
58 stars 59 forks source link

Duplicate search results #406

Closed abitrolly closed 5 years ago

abitrolly commented 5 years ago

Just search for enki.

image

steaksauce- commented 5 years ago

Might be if the package is available in the default repo and EPEL, but I'm not sure.

Zlopez commented 5 years ago

Another example libssh2 shows 2 results, curl shows 4 results and so on.

I wasn't able to find the pattern for this.

Zlopez commented 5 years ago

Looking at the sources, it looks like the issue is coming from here https://github.com/fedora-infra/fedora-packages/blob/master/fedoracommunity/search/index.py#L200

PDC is probably returning the duplicate results.

abitrolly commented 5 years ago

https://pdc.fedoraproject.org/rest_api/v1/global-components/?name=curl returns one result. https://pdc.fedoraproject.org/rest_api/v1/global-components/?name=curl&name=curl one result. https://pdc.fedoraproject.org/rest_api/v1/global-components/?name=curl&name=enki two results.

My bet is that there is some problem with database indexing. Like index key is not unique or contains colums that change for the same product name.

steaksauce- commented 5 years ago

I cannot reproduce the behavior on the version 4.2.0 tag. Not sure when the release for this is though.

abitrolly commented 5 years ago

@steaksauce- do you use the same DB as production?

steaksauce- commented 5 years ago

@steaksauce- do you use the same DB as production?

tbh, I don't know (probably not though).

Whatever docker-compose is giving me in the devel folder

cverna commented 5 years ago

I have been working on https://github.com/fedora-infra/fedora-search to replace the xapian backend of fedora-packages. I think having an independent service and the db stored in postgresql will make it easier to investigate and fix this kind of problems

abitrolly commented 5 years ago

And if the issue is repeated with the same backend, how do you debug it?

Also, how do you share PostreSQL DB?

Also, are you sure that PostreSQL stemmers are good for full text search?

cverna commented 5 years ago

And if the issue is repeated with the same backend, how do you debug it?

The indexing is making use of django ORM which makes the code easier to understand, instead of the current low level xapian api. The other main reason for using postresql is to deploy this application in our Openshift setup.

Also, how do you share PostreSQL DB?

This is relatively easy we already dump some of our databases nightly see here https://infrastructure.fedoraproject.org/infra/db-dumps/

Also, are you sure that PostreSQL stemmers are good for full text search?

Postresql full text search has been around for a while now, and it seems a viable solution for simple search application like this one.

abitrolly commented 5 years ago

Postresql full text search has been around for a while now, and it seems a viable solution for simple search application like this one.

I couldn't find any serious comparison yet, but there are references that PostgreSQL supports stemming, so I have to believe you that it is good. :)

cverna commented 5 years ago

Closing this ticket