Jaymon / wishlist

Read an Amazon wishlist programmatically with Python
Other
52 stars 11 forks source link

Wishlist CLI not returning data anymore because of TypeError #17

Open tym-project opened 4 years ago

tym-project commented 4 years ago

Used to work properly in a script, but has not been returning any data for a few months.

Running 0.2.2, command line fails with : wishlist dump 3L5SXXXXXBZ6K Traceback (most recent call last): File "/usr/local/bin/wishlist", line 8, in sys.exit(console()) TypeError: exit() missing 1 required positional argument: 'mod_name'

Upgraded to 0.3.1, same issue.

Jaymon commented 4 years ago

I think this is actually a problem with the older wishlist CLI using a newer captain (specifically captain 3.0.0+), you should be able to fix it by downgrading captain:

$ pip install captain==2.0.4

Alternatively, it would probably work with the latest version of captain if you modified sys.exit(console()) to look like this:

sys.exit(console(__name__))

The actual parser is still working because I use it every day. This is a legit problem with the CLI but I don't have time to fix it right now, so I would suggest one of the above remedies, if you modify the script and it works I'd love a pull request :)

tym-project commented 4 years ago

Thanks for the info, I'll try to fix the CLI... but my issue is actually with a script (no specific error, but no data is returned by "Whishlist()").

I should have specified earlier that I'm running it against amazon.fr. I'll try with an amazon.com wishlist to see if it's an issue with the fr version of the site.

tym-project commented 4 years ago

No luck adding "name"... Traceback (most recent call last): File "/usr/local/bin/wishlist", line 8, in <module> sys.exit(console(__name__)) File "/usr/local/lib/python3.5/dist-packages/captain/__init__.py", line 38, in exit s = Script(inspect.getfile(mod), module=mod) File "/usr/local/lib/python3.5/dist-packages/captain/__init__.py", line 145, in __init__ self.parse() File "/usr/local/lib/python3.5/dist-packages/captain/__init__.py", line 260, in parse raise ParseError("no main function found") captain.exception.ParseError: no main function found

No luck downgrading captain either.

Jaymon commented 4 years ago

So I just pushed Wishlist 0.4.0 that works with Captain 3.0.0. I tested on my personal wishlist and it worked:

$ python -m wishlist dump <HASH>
1. Clean Code: A Handbook of Agile Software Craftsmanship (Robert C. Martin Series) is $29.44
2. How to Read a Book (A Touchstone Book) is $13.99
3. Hynes Eagle 38L Flight Approved Weekender Carry on Backpack is $49.99
...

But my wishlist is from amazon.com and since it's not easy for me to test an amazon.fr wishlist you're probably on your own for figuring out that issue, I'll help however I can though.

I've tried to make wishlist.core.WishlistElement very forgiving of parse errors but I'm always chasing Amazon's changes

tym-project commented 4 years ago

Thanks, I've updated to wishlist 4.0.0 (brow 0.0.3 and captain 3.0.0), I have the same issue when I run $ wishlist dump 3U1HZP3ZCY3XA (this is an amazon.com public ID from some US non-profit) :

Traceback (most recent call last):
File "/usr/local/bin/wishlist", line 8, in <module>
   sys.exit(console())
TypeError: exit() missing 1 required positional argument: 'mod_name'

Running it with your syntax returns nothing (might be an env var issue, but same thing on an amazon.fr ID) :

$ python3 -m wishlist dump 3U1HZP3ZCY3XA
Done with wishlist, 1 total items

If I try to run it from a script, no errors but no data either :

#!/usr/bin/python3.5

import os
from wishlist.core import Wishlist

os.environ["WISHLIST_HOST"]="https://amazon.com/"
lists=[]
lists.append({'name':'Test','id':'3U1HZP3ZCY3XA'})

for list in lists:
    data = Wishlist(list['id'])
    for item in data:
        print(item)
tym-project commented 4 years ago

Got it ! It seems to be an issue with the BeautifulSoup parser. If I force lxml via env var HTML_PARSER, it works. It seems to be an issue where bs4 can't handle too many nested tags with some parsers (https://stackoverflow.com/a/14587348).

Side issue, in wishlist/core.py, I have to force the old (?) way of getting the env var, or else it does not work (if needed I can open a separate issue for this) :

[...]
class BaseAmazon(object):
    @property
    def host(self):
        #return environ.HOST
        return os.environ.get("WISHLIST_HOST", "https://www.amazon.com")
[...]

I think this is due to my script setting the env var after the import, as HOST is a variable in environ.py it might be set too early for that usecase?

Jaymon commented 4 years ago

I think you've nailed the environment problem correctly, I'm not sure I would change it though, if you are changing it in running code I would just import wishlist.environ directly and modify environ.HOST there instead of setting the environment variable.

Has everything been working ok with python 3? I originally wrote it and still run it in python 2 primarily, I try and write cross version code but I didn't explicitly add python3 support in setup.py and I'm not sure if that was an oversight, or intentional, on my part because it was so long ago.

It also looks like there might be an issue with Brow because if you had installed lxml it should've auto-discovered that. n fact, that's how my current wishlist setup does it because it also seems to be using lxml to parse my wishlist.

Ugh, definitely room to make all this better, and make it easier to surface these issues to the user.

tym-project commented 4 years ago

Nice idea for environ.host, it's working... I need to up my python game :) This could be a nice trick to put in the doc maybe ?

#!/usr/bin/python3.5

from wishlist.core import Wishlist
from wishlist import environ

environ.HOST="https://www.amazon.fr"
lists=[]
lists.append({'name':'Music','id':'<HASH>'})

for list in lists:
    data = Wishlist(list['id'])
    for item in data:
        print(item.title)

Regarding lxml, you are also correct, if installed brow detects and uses it... my testing methodology was flawed in this regard. Could you maybe add lxml as a dependency to wishlist ? html.parser does not seem compatible with Amazon.X anymore (or maybe it's a side effect of python3 ?).

I have no issues with python3, but I'm thinking the CLI issue could be coming from that ? I'm not planning on doing further testing as I'm not using the CLI, but if you would like me to, don't hesitate.

To be honest your work is really awesome, your code very clean and nicely documented... sure you could improve some things, but after all you're offering it to the community for free...so...thanks and don't sweat it !

Jaymon commented 4 years ago

I just added lxml as a dependency, I hadn't in the past because lxml was always a real pain to install and it had a default parser that I was able to use for the first little bit and so why add a hard to install dependency?

But those days might be over and I won't ever bother to fix the default parser myself since lxml is usually installed on my system.

I also updated the readme a bit with notes on how to manipulate the environment at runtime.

Thanks for all your help and I appreciate the kind words :)