SwiftPackageIndex / SwiftPackageIndex-Server

The Swift Package Index is the place to find Swift packages!
https://swiftpackageindex.com
Apache License 2.0
548 stars 43 forks source link

Sharing search results drops search terms #1592

Closed finestructure closed 2 years ago

finestructure commented 2 years ago

I searched for “toml” on the iPad, tapped he share icon and selected “Copy link”. The resulting url was

https://swiftpackageindex.com/search

i.e. it dropped the search term and therefore wouldn’t link to the actual results.

finestructure commented 2 years ago

Might be relevant that it was the iPad, although on he Mac I usually Cmd-L, Cmd-C, so it might not be apparent there even if the same bug exists.

daveverwer commented 2 years ago

I can reproduce this and it's very strange.

When shared from the iPad, other search sites include the query string so it's not a fundamental problem with iPad sharing.

However, as you say, if you paste in a search https://swiftpackageindex.com/search?query=toml to the address bar on any platform, it renders the full results so I don't think it's anything with JavaScript or anything that's happening after the page starts to load.

I don't even know where to start looking for this problem.

Sherlouk commented 2 years ago

It's been a while since I've dove into this stuff, but I believe Safari uses the canonical link which we set via a meta tag in the header.

<link rel="canonical" href="https://swiftpackageindex.com/search">
daveverwer commented 2 years ago

😱 What?!

So we need to remove the canonical meta tag for the search page and any other page where a query string is important? Maybe we should only include it on pages we know to be canonical, like package pages and the home page.

Sherlouk commented 2 years ago

Done a bit of StackOverflow diving and looks like my memory was good.

Confirmation from Apple employee: https://twitter.com/rmondello/status/910276001300488192 Confirmation from community: https://developer.apple.com/forums/thread/686701 Random hack to resolve the issue: https://stackoverflow.com/a/58787618/17582231

daveverwer commented 2 years ago

I was Googling too and this is a good document on how Google uses this (which is why we added it).

https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls

Random hack to resolve the issue: https://stackoverflow.com/a/58787618/17582231

While this is a bit 🤮 , it would work well and also keep the metadata there for Google. The best alternative would be to remove that meta tag for the search page.

Edit: Reading the comments on that hack, I don't think we should use that. It's easy to forget that Google executes JavaScript and acts like a normal browser these days.

Sherlouk commented 2 years ago

Aye it makes sense to want to have the meta tag there, especially for the search page as to not clutter what Google is tracking.

The comments on that hacky workaround don't seem too positive though given that we know Google waits a small while before parsing the HTML, it's plausible that by the time it's scraped that the script to remove the tag is gone.

daveverwer commented 2 years ago

The comments on that hacky workaround don't seem too positive though given that we know Google waits a small while before parsing the HTML, it's plausible that by the time it's scraped that the script to remove the tag is gone.

Heh. I edited my reply after reading the comments too 😅

daveverwer commented 2 years ago

This is really weird behaviour. Safari really shouldn't do this.

Sherlouk commented 2 years ago

Safari really shouldn't do this.

Put that on a T-Shirt. SPI merch.

finestructure commented 2 years ago

Is Google really not stripping out query parameters when determining canonical URLs? I don't have any experience with this at all but it struck me that all the duplicate URL examples seem to refer to duplicate locations in the URL part (i.e. before ?). Just wondering if we'd incur duplicates if we didn't set canonical URL on the search page.

finestructure commented 2 years ago

In particular, in the dynamic URLs section, the examples are

- https://www.example.com/products?category=dresses&color=green
- https://example.com/dresses/cocktail?gclid=ABCD
- https://www.example.com/dresses/green/greendress.html

I wonder if it's significant that they're not

- https://example.com/products?category=dresses&color=green
- https://example.com/products?gclid=ABCD
- https://example.com/products
Sherlouk commented 2 years ago

Not detracting from Sven's point above which is likely very fair, but if we do need to keep it - we could probably move the link to a HTTP header rather than a HTML link.

https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls#rel-canonical-header-method

Untested, but this might not be picked up by Safari and might work as expected. Big ifs, and as per Sven's point, might not even be necessary.

daveverwer commented 2 years ago

Do you mean to normalise the domain name? That's one use of it and we use it to make sure Google chooses https over http and without the www. Not that we even serve on http, so that's fairly academic.

I think what they're trying to demonstrate there is that you could inform Google that you'd prefer https://www.example.com/products/dresses/green over https://www.example.com/products?category=dresses&color=green and that they contain the same content.

daveverwer commented 2 years ago

We could include the query string in the canonical version. If we do that globally then we could make something that works for all pages (including future pages) without needing to remove the tag.

Sherlouk commented 2 years ago

I feel like actively including the query tag in the canonical link is more dangerous than just not setting it at all.

While we encourage you to use any of these methods, none of them are required. If you don't indicate a canonical URL, we'll identify what we think is the best version or URL.

daveverwer commented 2 years ago

is more dangerous than just not setting it at all.

Why? The canonical version of a search for TOML is https://swiftpackageindex.com/search?query=toml

We don't use query parameters frivolously. I may be missing something but I can't see any danger in including them.

Sherlouk commented 2 years ago

https://webmasters.stackexchange.com/a/114624

I take back my comment, looks like the recommendation is that if the query parameter changes the content then it should be included. If it's just for analytics (like UTM parameters) then it shouldn't be included.

daveverwer commented 2 years ago

It's worth noting that if we did this, it wouldn't ever include query parameters that other people included. For example, incoming links from iOS Dev Weekly include UTM parameters:

https://swiftpackageindex.com/pointfreeco/swift-custom-dump?utm_campaign=iOS%2BDev%2BWeekly&utm_medium=web&utm_source=iOS%2BDev%2BWeekly%2BIssue%2B541

We'd never include those parameters, as they're not necessary and not generated by us.

finestructure commented 2 years ago

Just for my understanding, if we set the canonical version to include the query parameters (on search only, right?) then it would mean from Google's perspective (and unless they strip query parameters and aggregate on the base URL regardless) these are all distinct pages, leading to a lower ranking of "the bare search URL". I.e. there's no longer a single bucket of "bare search URL" that gets all these hits.

I don't know if that's a bad thing but that's what would happen, right? Is there a scenario where this is a bad thing?

Sherlouk commented 2 years ago

That was my initial thinking Sven, by explicitly adding the query parameters we're telling Google they're all distinct pages which while technically accurate and according to the posts above seems to be 'expected' - it does devalue other pages.

Though to be fair, search doesn't come up at all in Google even if I search for "swift package index search" so not sure it's much of a loss 😂

finestructure commented 2 years ago

Though to be fair, search doesn't come up at all in Google even if I search for "swift package index search" so not sure it's much of a loss 😂

Yeah, I'm not sure there are many places out there linking to a search result page let alone suffer from being in their "own bucket" 🤷

daveverwer commented 2 years ago

and unless they strip query parameters and aggregate on the base URL regardless

They don't. Including query parameters here is a valid thing to do.

I don't know if that's a bad thing but that's what would happen, right? Is there a scenario where this is a bad thing?

I can't think of one. In fact, I can see it as a good thing for two reasons:

finestructure commented 2 years ago

~There might be another, related bug here. I'd shared a link to the Swift forums from my iPad and knowing that the search query would be stripped, I manually copied the url from Safari's url field and pasted it into the message. That url encoded the + and :. Karl spotted and fixed it.~

~Not sure if that's just a Safari or Swift forum problem but hopefully fixing the sharing could step around that.~

This is actually a Swift forums issue.

finestructure commented 2 years ago

Just got bitten by this again, this time on the phone. This time I had a really weird effect, perhaps due to Safari's tab groups or search history but the pasted url had a different search query parameter. I had searched for openapi but the pasted search was for composable.