WordPress / press-this

Press This is a little tool that lets you grab bits of the web and create new posts with ease. It will even allow you to choose from images or videos included on the page and use them in your post. Use Press This as a quick and lightweight way to highlight another page on the web.
48 stars 22 forks source link

Linked-In URLs do not parse correctly #26

Closed jpluimers closed 6 years ago

jpluimers commented 6 years ago

Parsed https://www.linkedin.com/pulse/what-your-approach-branching-tells-me-state-agile-adrian-kerry/

Expected:

Branching of code. How many of you have considered that this could be one of the biggest stumbling blocks to your agile transformation?

Actual: nothing.

Tried from https://wiert.wordpress.com/wp-admin/press-this.php

dshanske commented 6 years ago

This isn't an issue with Press This. LinkedIn is returning a 999 response code. They are not allowing it to be retrieved. They filter based on the user agent, it looks like. If you spoof the user agent, it might work, however...I don't think putting a User Agent spoof in for a site that doesn't want to be read is something I would suggest the plugin do.

dshanske commented 6 years ago

The code though, should surface to the user if an invalid response is returned. Opening a separate issue on that.

dshanske commented 6 years ago

@jpluimers Keep sending these examples. They are very useful to me. I am using them to test my fork of the This parsing code which I hope to send upstream someday.

jpluimers commented 6 years ago

@dshanske don't worry, I will. From a testing perspective, I usually want a lot of small tests coming from practice to see where users will break your stuff. This repository is no exception (:

Shame on Linked-In BTW.

kraftbj commented 6 years ago

That's interesting about Linked In. Medium does the same (aiming to block WordPress pingbacks) which is why Press This sends a different but still accurate UA than WordPress itself would.

kraftbj commented 6 years ago

Looking around on the interwebs, this seems to be solely a LinkedIn issue that we can't resolve on our end. I wouldn't spoof the UA, but apparently they do both UA AND IP filtering, so there isn't a good solution. I'm closing this for that reason.

jpluimers commented 6 years ago

Thanks for the investigation @kraftbj.

Out of both curiosity and documentation purposes: any underlying links from the interwebs?

kraftbj commented 6 years ago

I should have posted them, but don't handy. A few stack overflow pages of folks asking about the LinkedIn 999 response code.

jpluimers commented 6 years ago

@kraftbj I found the below ones on StackOverflow and a general sentiment from https://www.google.com/search?q=linkedin%20999%20response is "Using the invalid HTTP response 999, LinkedIn blocks vary over time depending on both UserAgent and IP address blocks including many hosting and cloud service providers".