SimpleMachines / SMF

Simple Machines Forum — SMF in short — is free and open-source community forum software, delivering professional grade features in a package that allows you to set up your own online community within minutes!
https://www.simplemachines.org/
Other
599 stars 255 forks source link

Search this topic does not deliver any results #7870

Open 7wells opened 1 year ago

7wells commented 1 year ago

Description

Search This topic does not deliver any results.

Steps to reproduce

  1. Visit a topic.
  2. Select This topic from drop-down menu.
  3. Enter a search term in the search field.
  4. Click SEARCH button.

smf

Environment (complete as necessary)

As I am not the board (https://forum.fhem.de) owner, I cannot provide details about database and PHP, sorry.

Additional information/references

None

Oldiesmann commented 1 year ago

Works fine for me on 2.1.4 at seniorsandfriends.org

Sesquipedalian commented 1 year ago

Since no one else can reproduce the issue and @7wells gave a thumbs up to @Oldiesmann's reply, I am interpreting that to mean that the issue can be closed. If that is incorrect and the issue is still occurring for you, @7wells, please feel free to reply to this with more information about how we can reproduce the problem.

isaak654 commented 7 months ago

Issue confirmed by @sbulen in https://www.simplemachines.org/community/index.php?msg=4172803

Yep, confirmed. I think this is a bug.

Guests get the not allowed message.

Regular users get a no results message.

Admins get proper results.

Oldiesmann commented 7 months ago

This affects SMF 3.0 as well.

Oldiesmann commented 7 months ago

Seems to be fixed for 3.0 now with the latest search improvements

isaak654 commented 2 months ago

@Sesquipedalian Please consider removing the “Confirmation needed” label and assigning a milestone for 2.1.x.

sbulen commented 1 month ago

From: https://www.simplemachines.org/community/index.php?msg=4181419

Pretty sure it's not tied to the php or mysql version. My test copy (php 8.3, mysql 8.4; I also tried php 7.4) shows the same behavior as production (php 8.2, mysql 8.0 percona). My forum uses a custom index.

In my forum, it doesn't always fail... But there are patterns...

  • It fails for certain terms in certain topics/boards, for non-admins & guests, in a repeatable fashion.
    • The same user may get proper search results on any other board, for any other search term.
    • The same user may get proper search results using a different search term in the same topic or board.
    • If the same user performs a forum-level search, the results for that topic/board appear to be missing... (!!!)
    • New posts with the same term in the same topic/board do not show up either... (I.e., it's not just migrated data.)
    • Guests are allowed to search, & may get a "You are not allowed to access this section" error first time thru - specifically for the problem query in the problem topic/board. If guests have otherwise passed the required captcha, they may instead just get a no matching search results message.
  • The words are indexed properly, or it would fail for the admin, too. They look fine in the db in the custom search index.
  • Board permissions match all the other public boards - nothing unique about them.
  • Clearing cache makes no difference.

Sure feels like a query/join issue. Something along the lines of one of the posts with that term is in an off-limits board for that user, and it's bubbling up & affecting the query somehow. It's possible this doesn't affect folks using 3rd party search tools, e.g., this forum uses Sphinx, IIRC...

sbulen commented 1 month ago

So... There are a number of different variants on this in the code, but search basically does a high level word search, stores the complete first-pass result set into log_search_results, then applies any filters if necessary.

I did two queries in a row, one from an admin (that works, id=130), and one from a user (that doesn't, id=131). In the first stage, they both return the same 9 results, so far, so good: image

The problem occurs in one of the subsequent filtering steps. The admin, of course, is not filtered. But the user result gets filtered based on topic approval state here: https://github.com/SimpleMachines/SMF/blob/e0b5174087ba0521ca8a6bc3fdcea5496c3bd75e/Sources/Search.php#L1855

Which looks something like this at execution time:

SELECT 3207 AS id_topic, lsr.id_msg, lsr.relevance, lsr.num_matches FROM smf_log_search_results AS lsr INNER JOIN smf_topics AS t ON (t.id_topic = lsr.id_topic) WHERE lsr.id_search = 131 AND (t.approved = 1 OR t.id_member_started = 3087)

Which looks harmless enough... But... Look carefully at the screenshot above... For some reason, for this particular path thru the logic, those aren't topic IDs in the topic ID column, they're message IDs... So this query effectively filters out most/all of the search results...

Still looking into this, but I think this is the culprit right here: https://github.com/SimpleMachines/SMF/blob/e0b5174087ba0521ca8a6bc3fdcea5496c3bd75e/Sources/Search.php#L1320

sbulen commented 1 month ago

Note 3.0 has the same code: https://github.com/SimpleMachines/SMF/blob/9fceb31b791f9c88bfcb4f1a3ef87fdd78fa5877/Sources/Search/SearchApi.php#L1781

sbulen commented 1 month ago

I suspect the quirky code above (stuffing msg IDs into the topic ID column...) is a bit of a red herring.... Mainly because the comment above that code ("Outrageous!!!") is a clue they knew they were doing something weird... I sure wish there were a better clue in the comments than that, tho...

(Great job on the 3.0 refactor, @Sesquipedalian ... It seems to have ironed out several quirks....)

sbulen commented 1 month ago

This has been bugging me like a sore tooth... I have started getting complaints on my 2.1 forum... Once they noticed, they noticed...

I have confirmed the problem exists in 3.0, 2.1, and even 2.0. (edited my comments above...)

Here is what I think is going on:

So, to reproduce in 3.0, you must enable post moderation. For some reason, post moderation is NOT enabled by default in 3.0, but it is enabled by default in 2.1. (This is why I initially thought 3.0 was OK.)

With post moderation ON, any message ID that doesn't point to a valid topic, when joined on id_msg = topic_id, is excluded from search results. Which happens a lot, actually, because you tend to have a lot more messages than topics. It will get much worse on more recent messages & topics on mature forums. Or forums with lots of replies.

The problem happens for any search engine, as this particular logic (post moderation check) is independent of the API.

Admins are not affected as they can see unapproved topics.

sbulen commented 1 month ago

In 3.x, I think the structure should change. Get rid of the workarounds. Instead of relying on INSERT IGNORE to lop off dupes, fix the GROUP BYs instead. There is so much inscrutable logic in there, with dynamically built queries that are quite difficult to read... But I think the reality is it would simplify things a lot. It is much more convoluted than it needs to be. At its core, these are just approval & permissions checks... No need to get so complicated.

...And don't stuff id_msg into an id_topic column.

live627 commented 1 month ago

I'd like to see the end result. The search area is one place I could never really figure out what all was going on.