TheFive / osmbc

Open Street Map Blog Collector
MIT License
28 stars 11 forks source link

URL search with less chars does not match #928

Closed Strubbl closed 2 years ago

Strubbl commented 2 years ago

Searching for the URL https://sammyhawkrad.github.io/OsmoseStats does not reveal any result: https://osmbc.openstreetmap.de/article/searchandcreate?search=https%3A%2F%2Fsammyhawkrad.github.io%2FOsmoseStats&SearchNow=

but searching for the same URL https://osmbc.openstreetmap.de/article/searchandcreate?search=https%3A%2F%2Fsammyhawkrad.github.io%2FOsmoseStats%2F&SearchNow= (but with a slash at the end) gives a result: https://osmbc.openstreetmap.de/article/searchandcreate?search=https%3A%2F%2Fsammyhawkrad.github.io%2FOsmoseStats%2F&SearchNow=

It would be nice if the first search would also show the result of the second search to avoid duplicates.

TheFive commented 2 years ago

looks my search function needs an update, i have used Postgres Full Text search, so it can be, that this search really is a word search. I have seen, that postgres can convert Web Search Operators internally (so that you can exclude words from your search with "-" like known from search engines.

And - if the request is only a url, it CAN make sense to use the is as a part of an url, but than it will find ALL suburls (which is not the case now.

After everyone swithced to HTTPS, i have tried to make the search more robust by searching for http and https, may be it will be the best to have a similar bridge for the closing "/".

which will lead to an OR request for

https://sammyhawkrad.github.io/OsmoseStats/
https://sammyhawkrad.github.io/OsmoseStats
http://sammyhawkrad.github.io/OsmoseStats/
http://sammyhawkrad.github.io/OsmoseStats

(which should be no runtime problem, if there is an index by word for the search)

Strubbl commented 2 years ago

The doublette check is affected of this change. Please have a look at the article's section "Links used in other Post" of https://osmbc.openstreetmap.de/article/26986 I am not sure if we want to avoid this or keep this as is. I think it is impossible to avoid all false positives. Thus, maybe a warning more to check is better than having no warning for a real doublette?

TheFive commented 2 years ago

looks to be a bug, was not intended this way. May be the link was cutted "to much" and my test cases didn't catch that up.

TheFive commented 2 years ago

Fixed, your sample link is working now.