alphapapa / pocket-reader.el

Emacs client for Pocket reading list (getpocket.com)
GNU General Public License v3.0
223 stars 12 forks source link

Some articles return the wrong URL #24

Closed map7 closed 1 year ago

map7 commented 4 years ago

I've got a few articles in my pocket which when I try and read them in pocket-reader.el it retrieves the wrong URL which means pandoc fails.

Here is an example of an article I saved.

The correct URL, which works in the browser when logged into getpocket https://www.alphr.com/samsung/1005789/samsung-dex-docking-hub-galaxy-s8-note8

The same article in pocket-reader.el returns the following URL http://alphr.com/go/1005789

This returns a 404 error.

alphapapa commented 4 years ago

Please see the customization option pocket-reader-url-priorities. Pocket returns multiple URLs for some articles, and it seems that in this case, it's returning one that is broken.

map7 commented 4 years ago

Could you please give me an example of how to use pocket-reader-url-priorities, I'm only just learning lisp. What do I put for 'amp_url'?

alphapapa commented 4 years ago

Use the customization system. M-x customize-group RET pocket-reader RET.

map7 commented 4 years ago

Here are my results, There is still one article which I cannot get to work (the Rails API article) which works in my getpocket

Tried different order of the three URLs with different results for each;

| amp | resolved | given | Torch | DeX | API |
|-----+----------+-------+-------+-----+-----|
|   1 |        2 |     3 | y     | n   | n   |
|   1 |        3 |     2 | y     | y   | n   |
|   2 |        3 |     1 | y     | y   | n   |
|   2 |        1 |     3 | y     | n   | n   |
|   3 |        2 |     1 | y     | y   | n   |
|   3 |        1 |     2 | y     | n   | n   |

| name  | URL                                                                          |
|-------+------------------------------------------------------------------------------|
| DeX   | http://www.alphr.com/samsung/1005789/samsung-dex-docking-hub-galaxy-s8-note8 |
| Torch | https://github.com/ankane/torch.rb                                           |
| API   | https://prathamesh.tech/2020/07/28/how-i-write-tests-for-my-rails-api-apps/  |

Is there other URLs sent by the API which I have to add? Is it possible for pocket-reader.el to test each one and just return the working url?

alphapapa commented 4 years ago

Here's the way I see it: if the "given" URL (i.e. the one you actually added to Pocket) no longer works, then that Web site is at fault.

Now it might be that the "given" URL originally redirected to the "real" (i.e. "resolved") URL, and that the redirecting URL is now broken while the "real" one still works, and when you added the URL, Pocket followed it to the "real" one and still has that one stored. In that case, the "resolved" URL should work.

And by that logic, why use anything other than the "resolved" URL? Well, maybe some web sites move the "real" URL while keeping the redirecting ("given") URL. Again, Web site at fault.

And the AMP URL...that's just Google's attempt to further monopolize Web ads, right? So who cares about that.

So if you want to write a patch to expose the different URLs per-entry, I'm willing to consider it.

I don't think it would be necessary or good to automatically try different URLs, because what qualifies as "working"? In your own table, you report "y" and "n". What do those mean? Is every "n" a 404? Do so many of those sites really play musical chairs with their URLs on a regular basis?

In conclusion, what a mess the Web has become. I'm not very interested in writing code to accommodate such a hall of mirrors. IMO, if a handful of the thousands of links I've added to Pocket over the years exhibit this problem, I should be able to find a working URL for the site using Google, otherwise the site's probably offline, anyway.

I hope you understand what I'm getting at. :)

alphapapa commented 4 years ago

BTW, your table would be much more useful if you reported the 3 types of URLs for each URL. Otherwise I can't even see what the problem is. For example, the last link loads for me in a browser, so why would it always be n in your table?

alphapapa commented 4 years ago

It may also be that Pocket is broken internally. For example, I recently fixed a bug caused by Pocket returning empty strings for URLs for some entries. That makes no sense at all, but it does that sometimes.

And considering Mozilla's recent behavior, I wouldn't expect Pocket to remain usable for much longer. Be sure to backup your links regularly if you care about them. Maybe this package can be repurposed for Wallabag or one of those Pocket clones someday.

map7 commented 4 years ago

I see your point.

In the table above the 'n' means pocket-reader doesn't loads document in emacs buffer when I hit enter on it. The URL's I gave are the URLs of the articles when I click on them in getpocket.com and view the original document.

Wallabag does look interesting and I do like self hosting to keep my data in my control so I might look at that as an alternative one day.

How can I get access to the JSON returned data for each article so that I have the different URL's for my table?

map7 commented 4 years ago

I worked out where I can print the JSON return, here is the output I got for each article

| name  | amp URL                                                                         |
|-------+---------------------------------------------------------------------------------|
| DeX   | nil                                                                             |
| Torch | nil                                                                             |
| API   | https://prathamesh.tech/2020/07/28/how-i-write-tests-for-my-rails-api-apps/amp/ |

| name  | given URL                                                                    |
|-------+------------------------------------------------------------------------------|
| DeX   | http://www.alphr.com/samsung/1005789/samsung-dex-docking-hub-galaxy-s8-note8 |
| Torch | https://github.com/ankane/torch.rb                                           |
| API   | https://prathamesh.tech/2020/07/28/how-i-write-tests-for-my-rails-api-apps/  |

| name  | resolved URL                                                                |
|-------+-----------------------------------------------------------------------------|
| DeX   | http://alphr.com/go/1005789                                                 |
| Torch | https://github.com/ankane/torch.rb                                          |
| API   | https://prathamesh.tech/2020/07/28/how-i-write-tests-for-my-rails-api-apps/ |

It looks like given URL should always be picked, when I put the priority as given_url, resolved_url, amp_url only the API key doesn't work.

If delete the amp_url from the Pocket Reader Url Priorities then all of them don't work.

alphapapa commented 4 years ago

In the table above the 'n' means pocket-reader doesn't loads document in emacs buffer when I hit enter on it.

There could be a number of reasons that a page doesn't load in Emacs while it might load fine in a "real" browser. You haven't said what browser function you use, or what Emacs version, so we don't know what the actual problem is. It might not even be a problem with this package, or not always with it.

The URL's I gave are the URLs of the articles when I click on them in getpocket.com and view the original document.

Which of the three URLs (or maybe some other one, since their own frontend may use internal APIs) does that use?

It looks like given URL should always be picked, when I put the priority as given_url, resolved_url, amp_url only the API key doesn't work.

I don't understand what you mean. Are you saying there's a Pocket API key error?

If delete the amp_url from the Pocket Reader Url Priorities then all of them don't work.

What is "them"? What does "don't work" mean? You need to be specific about actions and outcomes in order for me to understand what's happening.

map7 commented 4 years ago

I'm running Emacs 27.1 stable compiled under Debian 10 64bit. The problem is happening with the pandoc conversion for the following article;

amp_url:      https://prathamesh.tech/2020/07/28/how-i-write-tests-for-my-rails-api-apps/amp/
given_url:    https://prathamesh.tech/2020/07/28/how-i-write-tests-for-my-rails-api-apps/
resolved_url: https://prathamesh.tech/2020/07/28/how-i-write-tests-for-my-rails-api-apps/
Contacting host: prathamesh.tech:443
org-web-tools--html-to-org-with-pandoc: Pandoc failed

Sorry instead of 'API key' I meant the 'Rails API app' article.

In relation to "them' I mean the three example URLs I mentioned in my tests and when I say they don't work I mean they get the Pandoc failed error above.

alphapapa commented 4 years ago

Okay, so the problem appears to be with the org-web-tools function that converts the HTML to Org with Pandoc, rather than being an issue with Pocket's returning different URLs for an article.