SimonHalvdansson / Harmonic-HN

Modern Android client for Hacker News
https://play.google.com/store/apps/details?id=com.simon.harmonichackernews
Apache License 2.0
611 stars 40 forks source link

Set search typo tolerance to min #177

Closed jonas-w closed 2 weeks ago

jonas-w commented 1 month ago

The search results always seemed really irrelevant compared to searching directly on https://hn.algolia.com

I dug a bit, and found out that the typoTolerance is set to true per default, which yields often times very irrelevant results, for example the results for 'ETF':

~ curl 'https://hn.algolia.com/api/v1/search_by_date?query=ETF&tags=story' 2>/dev/null | jq '.hits[].title'
"The false reason of the cost of AI in the world run of cost reduction"
"Intel P Core vs. E Core actual advantage?"
"Who is NOT moving from C/C++ to Rust?"
"Why Chiefs of Staff Need an Effective Framework"
"Show HN: Ell – A command-line interface for LLMs written in Bash"
"LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference"
"Show HN: AI Daily Planner"
"Show HN: Test Pilot Hub – A Platform to Exchange TestFlight Beta Reviews"
"European Commission reinstates 100ml liquids rule in EU airports"
"CodeMapper: AI's Guide to Understanding Code"
"Show HN: ShortLoop – Replay Audio Calls and Test Your VoiceAI"
"Gamified community service for highschool students who want to get into college"
"SFSS: The Original Offline Communication Protocol"
"Show HN: I built a Chrome extension to easily share passwords and tokens"
"Interpreting LLM outputs reveals the \"token noise\" effect"
"Human, SQL, and S3-friendly archives?"
"Show HN: I built an app to text myself and get reminders, notes, journal, etc."
"Rapidly build efficient sites with Neat, the minimalist CSS framework"
"Show HN: Using AI to Generate Custom Sounds from Text"
"Ask HN: Is commoditization of AI finally going to burst the AI bubble/hype?"

Notice how none of these post are related to the search 'ETF'?

The parameter is documented here: https://www.algolia.com/doc/guides/managing-results/optimize-search-results/typo-tolerance/in-depth/configuring-typo-tolerance/

I opted for 'min' as it's recommended when sorting (which we do by using the search_by_date endpoint), it's description is:

min: only keeps results with the lowest number of typos. This means that if you have one or more records that match, you’ll only receive those records, but if you have no records that match, you’ll receive records with typo counts of 1 (or 2 if there are none with 1). When using a sort-by attribute, set typo tolerance to min to reduce irrelevant search results.

Now with typoTolerance=min set, the results look way more relevant:

~ curl 'https://hn.algolia.com/api/v1/search_by_date?query=ETF&tags=story&typoTolerance=min' 2>/dev/null | jq '.hits[].title'
"Unusual Whales Subversive Democratic Trading ETF"
"ETFs are eating the bond market"
"The Case for Investing in Vanguard Total World ETF"
"Will Spot ETH ETF Continue to Pull Back Ethereum Price?"
"US spot Ether ETFs make market debut in another win for crypto industry"
"Spot Ethereum ETFs Approved to Start Trading Tomorrow"
"Spot Ethereum ETFs get final SEC sign off to begin trading Tuesday"
"Ethereum ETF Approval: New Era for Crypto Adoption"
"Ask HN: Alternatives to IEX Cloud API"
"Ask HN: What are your favorite index ETFs for Investing?"
"Ethereum ETF Launch Captivates Crypto Market"
"The ETF Innovation Black Hole"
"Ask HN: Could AI be a dot com sized bubble?"
"Case for Leveraged ETFs"
"Nvidia Surpassing Apple Market Cap Sets Up XLK ETF Rebalancing"
"Ask HN: Invest in your own bootstrapped company or in the stock market?"
"SEC's surprise blessing of Ethereum ETFs is the crypto makeover no one expected"
"I spent a year and $5,700 to see if ChatGPT can beat the market (S&P 500)"
"Spot Ether ETFs receive official approval from the SEC"
"SEC Opens the Door for Spot Ether ETFs in Big Crypto Victory"
SimonHalvdansson commented 2 weeks ago

I've been wondering why the search results are often so bad, good catch!