EricJMarti / inventory-hunter

⚡️ Get notified as soon as your next CPU, GPU, or game console is in stock
MIT License
1.12k stars 263 forks source link

help euro € price problem amazon letters: & n b s p ; #54

Closed Metus88 closed 3 years ago

Metus88 commented 3 years ago

I read about change Dockerfile but with euro there is a problem on amazon France Italy Germany and all the country with euro€value: The price contain strange characters 19,99 € example: <span id="price_inside_buybox" class="a-size-medium a-color-price"> 19,99&nbsp;€ </span>

And the program tells me that it's not able to convert in float the string of the price.

I edited en_US.UTF-8 to it_IT.UTF-8 or fr or gr but it didn't work.

Someone managed to make it work with € ? Thanks

bibik92 commented 3 years ago

I'm having the same problem too.

Pipodi commented 3 years ago

It's not that simple. The locale must be changed to it_IT.UTF-8, yes, but you need to change some code too. I'm no Python expert, but I currently have a Docker container scaping Amazon for 2080 that are currently on stock and I'm receiving the notifications on my Discord server.

You have to remove the &nbsp; HTML entity using html.unescape() (I've added it to the price_str in common.py file). Then you have to change strings corresponding to "add to cart" to "aggiungi al carrello" in some parts of the code.

After that, re-build the container and create your config. I've tested it on Amazon.it only, I'll try and test it on other domains tomorrow if I have some spare time, but it should work for every Euro country. But you have to translate every string in every language you want to scrape.

Metus88 commented 3 years ago

It's not that simple. The locale must be changed to it_IT.UTF-8, yes, but you need to change some code too. I'm no Python expert, but I currently have a Docker container scaping Amazon for 2080 that are currently on stock and I'm receiving the notifications on my Discord server.

You have to remove the &nbsp; HTML entity using html.unescape() (I've added it to the price_str in common.py file). Then you have to change strings corresponding to "add to cart" to "aggiungi al carrello" in some parts of the code.

After that, re-build the container and create your config. I've tested it on Amazon.it only, I'll try and test it on other domains tomorrow if I have some spare time, but it should work for every Euro country. But you have to translate every string in every language you want to scrape.

Thanks a lot! Could you share (or copy) your common.py file?

Pipodi commented 3 years ago

If you are not in a hurry to grab yourself some GPUs (believe me, I wish I could have bought a 3080 FE when I had a chance), I suggest you to wait until @EricJMarti implements the internationalization framework. But if you want I can fork the repo and "maintain" an ad-hoc solution.

Metus88 commented 3 years ago

@Pipodi If you can share I will happy, because I'm trying to use html.unescape() but without success. (I'm noob).

        price_str = tag if isinstance(tag, str) else tag.text.strip()
        price_str = html.unescape(price_str)

For the "add to cart" problem I edited:

Pipodi commented 3 years ago

Ok, this is what I've changed.

Dockerfile

ENV LC_ALL it_IT.UTF-8
ENV LANG it_IT.UTF-8
ENV LANGUAGE it_IT.UTF-8

src/scraper/common.py

line 33-34

price_str = tag if isinstance(tag, str) else tag.text.strip()
price_str = html.unescape(price_str).strip()

Don't forget to add the html import import html

Then I changed "add to cart" string to "aggiungi al carrello". Here you can concatenate or-conditions, if you want. But yesterday, when I tried this, I didn't think of that.

class GenericScrapeResult(ScrapeResult):
    def parse(self):
        # not perfect but usually good enough
        if self.has_phrase('aggiungi al carrello') or self.has_phrase('add to basket'):
            self.alert_subject = 'In Stock'
            self.alert_content = self.url

src/scraper/amazon.py

line 24

if tag and 'aggiungi al carrello' in tag.text.lower():

Then rebuild the Docker image. I suggest you to use another tag for your Docker image, i.e. pipodi/inventory-hunter:latest, so you won't overwrite the original Docker image:

docker build -t {image/name:tag} .

then change the image name in docker_run.bash file:

line 20

default_image="{image/name:tag}"

Now you can run the docker_run.bash file with your configs.

I hope I've explained the changes well, if not just ask!

Metus88 commented 3 years ago

Thanks for the help! I did what you wrote, but I still have this on logs: W2020-12-07 15:06:48,053 [mzn_t_3] unable to convert "389,00 €" to float... caught exception: could not convert string to float: '38900\xa0€

I don't care to overwrite so I did:

docker container ls -a (to have the number of my old docker)

docker container rm 30bfdf575fe2 (to remove my old docker)

I edited the files:

And I ran: docker pull ericjmarti/inventory-hunter:latest

/docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/********

After that I check the log: docker logs -f amazon_rtx_3080 But I had the same error. So I edited also line 18-19 in src/scraper/amazon.py:

        price_str = self.set_price(tag)
        price_str = html.unescape(price_str).strip()

but the same problem... I'm noob with docker and html.unescape so I did some error but I don't understand where...

Pipodi commented 3 years ago

Because you are running the wrong Docker image.

If you do docker pull ericjmarti/inventory-hunter:latest and ./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******

you are still using the original Docker image, without the ad-hoc edits we made.

So:

The reason you still having issues and it seems that it doesn't "save" your edits it's because Docker will use the original Docker image (the one without our edits), but we want it to use our image.

Hope this clarifies the situation.

Metus88 commented 3 years ago

You did a clear explanation! I have understood now. Thanks a lot. Now for me, it works!

Metus88 commented 3 years ago

Is it normal for a product that doesn't have a price because is not in stock this message in the log? example for this GPU: https://www.amazon.it/GeForce-GDDR6X-dissipatore-triventola-refresh/dp/B08HN37VQK

E2020-12-07 17:57:04,927 [mzn_t_1] caught exception during request: argument of type 'NoneType' is not iterable
E2020-12-07 17:57:04,929 [mzn_t_1] scrape failed
E2020-12-07 17:57:13,216 [mzn_t_2] caught exception during request: argument of type 'NoneType' is not iterable
E2020-12-07 17:57:13,219 [mzn_t_2] scrape failed
schlep83 commented 3 years ago

Because you are running the wrong Docker image.

If you do docker pull ericjmarti/inventory-hunter:latest and ./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******

you are still using the original Docker image, without the ad-hoc edits we made.

So:

  • Do the edits;
  • Run docker build -t {imagename} . (where {imagename} can be something like metus88/inventory-hunter:latest. Be sure to type the period after the image name);
  • Edit the docker_run.bash at line 20, replace ericjmarti/inventory-hunter:latest with the image name you chose at the previous point;
  • Run ./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******

The reason you still having issues and it seems that it doesn't "save" your edits it's because Docker will use the original Docker image (the one without our edits), but we want it to use our image.

Hope this clarifies the situation.

Thanks for your explanation pipodi. Like Metus88, I also had problems converting the price from string to float because I am based in Singapore and the Amazon pricing looks like 'S$980'.

Therefore, what I did was to:

1) Edit the src/scraper/common.py file. I needed to replace the 'S' in front of the price with nothing so that the program can convert the price_str variable to float. 2) Build the image file again by running docker build -t {imagename} . 3) As I am running the program under Windows PowerShell, instead of editing docker_run.bash which you suggested, I edited docker_run.ps1 instead under line 6 to reflect the new imagename. The code I ran was: .\docker_run.ps1 -Config .\config\(myconfigfile).yaml -Alerter discord -Webhook https://discord.com/api/webhooks/***

The program could run. When I checked the log, I could see that the conversion from string to float is no longer a problem. However, I got the error message "[root] There was an issue sending to discord due to an invalid status code back -> 401". In other words, even though there was an item in stock, it could not send an alert to discord.

I did not have any problem sending alerts to discord if I use the original Docker image, although the price conversion to float was there when using the original Docker image.

Just putting it out there for anyone who has similar issues. Any help would be appreciated.

Pipodi commented 3 years ago

@Metus88 Yeah, it should be.

@schlep83 401 is Unauthorized, are you sure that you have authorization to use the webhook?

netsky84 commented 3 years ago

Because you are running the wrong Docker image. If you do docker pull ericjmarti/inventory-hunter:latest and ./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/******* you are still using the original Docker image, without the ad-hoc edits we made. So:

  • Do the edits;
  • Run docker build -t {imagename} . (where {imagename} can be something like metus88/inventory-hunter:latest. Be sure to type the period after the image name);
  • Edit the docker_run.bash at line 20, replace ericjmarti/inventory-hunter:latest with the image name you chose at the previous point;
  • Run ./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******

The reason you still having issues and it seems that it doesn't "save" your edits it's because Docker will use the original Docker image (the one without our edits), but we want it to use our image. Hope this clarifies the situation.

Thanks for your explanation pipodi. Like Metus88, I also had problems converting the price from string to float because I am based in Singapore and the Amazon pricing looks like 'S$980'.

Therefore, what I did was to:

1. Edit the src/scraper/common.py file. I needed to replace the 'S' in front of the price with nothing so that the program can convert the price_str variable to float.

2. Build the image file again by running `docker build -t {imagename} .`

3. As I am running the program under Windows PowerShell, instead of editing docker_run.bash which you suggested, I edited **docker_run.ps1** instead under line 6 to reflect the new imagename. The code I ran was:
   `.\docker_run.ps1 -Config .\config\(myconfigfile).yaml -Alerter discord -Webhook https://discord.com/api/webhooks/***`

The program could run. When I checked the log, I could see that the conversion from string to float is no longer a problem. However, I got the error message "[root] There was an issue sending to discord due to an invalid status code back -> 401". In other words, even though there was an item in stock, it could not send an alert to discord.

I did not have any problem sending alerts to discord if I use the original Docker image, although the price conversion to float was there when using the original Docker image.

Just putting it out there for anyone who has similar issues. Any help would be appreciated.

Same here... I have done exactly all your modification for amazon.it and now I have the same error message

schlep83 commented 3 years ago

@Metus88 Yeah, it should be.

@schlep83 401 is Unauthorized, are you sure that you have authorization to use the webhook?

@Pipodi I think so. Like I said earlier, when I use the original Docker image, the alert to Discord works fine. That would imply I have authorization to use the webhook, no? However, when I use the docker image which I build with my modifications, it could not work.

Pipodi commented 3 years ago

@schlep83 I tried right now with my repo fork with the edits we discussed above and I receive the alerts on my Discord channel. What did you edit?

Metus88 commented 3 years ago

I close the issues becouse Discord problem is off topic for this issue. However I can confirm that the webhook for discord works great with the standard version and also with the mod version from @Pipodi .

jamesearls commented 3 years ago

@Pipodi Do your changes work for amazon.co.uk?

Pipodi commented 3 years ago

It should. Later I will try and check for fr, es and de. I will check for co.uk too.

jamesearls commented 3 years ago

Thank you. Appreciate it!

Pipodi commented 3 years ago

@jamesearls

I2020-12-08 19:34:18,379 [mzn_c_k_1] scraper initialized for https://www.amazon.co.uk/Corsair-Hydro-FOUNDERS-Water-Block/dp/B08NCSXYVX/ref=sr_1_7?dchild=1&keywords=rtx+3080&qid=1607455519&sr=8-7
I2020-12-08 19:34:18,380 [mzn_c_k_2] scraper initialized for https://www.amazon.co.uk/Gigabyte-GeForce-Graphics-GV-N3080GAMING-OC-10GD/dp/B08HLZXHZY/ref=sr_1_4?dchild=1&keywords=rtx+3080&qid=1607455519&sr=8-4
I2020-12-08 19:34:32,893 [mzn_c_k_1] now in stock at 166.64!
E2020-12-08 19:34:41,488 [mzn_c_k_2] caught exception during request: 'NoneType' object has no attribute 'text'

The second one throws an exception probably because it has no price listed, but as you can see, it works. If you want to see my edits, check out my fork: https://github.com/Pipodi/inventory-hunter as I'm waiting for the maintainer to implement the internationalization framework (I do not code in Python, I'm a Java dev, so I don't know where to start. I'm just experimenting changes learning from what it is coded).

Obviously, you have to edit the ENV variables of the Dockerfile to en_GB.UTF-8 and rebuild the Docker image.

Vega98 commented 2 years ago

Because you are running the wrong Docker image.

If you do docker pull ericjmarti/inventory-hunter:latest and ./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******

you are still using the original Docker image, without the ad-hoc edits we made.

So:

* Do the edits;

* Run `docker build -t {imagename} .` (where {imagename} can be something like metus88/inventory-hunter:latest. Be sure to type the period after the image name);

* Edit the **docker_run.bash** at line 20, replace `ericjmarti/inventory-hunter:latest` with the image name you chose at the previous point;

* Run `./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******`

The reason you still having issues and it seems that it doesn't "save" your edits it's because Docker will use the original Docker image (the one without our edits), but we want it to use our image.

Hope this clarifies the situation.

For people trying to make this solution work today, DO NOT edit line 20 of docker_run.bash, edit line 21. It should be something like this:

default_image="ericjmarti/inventory-hunter:latest" image="{your/imagename}"

If these two fields share the same value, a docker pull command will be executed, which will lead to "requested access to resource denied" errors.

At least, this is how I managed to make it work.