Closed Metus88 closed 3 years ago
I'm having the same problem too.
It's not that simple. The locale must be changed to it_IT.UTF-8
, yes, but you need to change some code too. I'm no Python expert, but I currently have a Docker container scaping Amazon for 2080 that are currently on stock and I'm receiving the notifications on my Discord server.
You have to remove the
HTML entity using html.unescape()
(I've added it to the price_str
in common.py
file). Then you have to change strings corresponding to "add to cart" to "aggiungi al carrello" in some parts of the code.
After that, re-build the container and create your config. I've tested it on Amazon.it only, I'll try and test it on other domains tomorrow if I have some spare time, but it should work for every Euro country. But you have to translate every string in every language you want to scrape.
It's not that simple. The locale must be changed to
it_IT.UTF-8
, yes, but you need to change some code too. I'm no Python expert, but I currently have a Docker container scaping Amazon for 2080 that are currently on stock and I'm receiving the notifications on my Discord server.You have to remove the
HTML entity usinghtml.unescape()
(I've added it to theprice_str
incommon.py
file). Then you have to change strings corresponding to "add to cart" to "aggiungi al carrello" in some parts of the code.After that, re-build the container and create your config. I've tested it on Amazon.it only, I'll try and test it on other domains tomorrow if I have some spare time, but it should work for every Euro country. But you have to translate every string in every language you want to scrape.
Thanks a lot! Could you share (or copy) your common.py file?
If you are not in a hurry to grab yourself some GPUs (believe me, I wish I could have bought a 3080 FE when I had a chance), I suggest you to wait until @EricJMarti implements the internationalization framework. But if you want I can fork the repo and "maintain" an ad-hoc solution.
@Pipodi If you can share I will happy, because I'm trying to use html.unescape() but without success. (I'm noob).
price_str = tag if isinstance(tag, str) else tag.text.strip()
price_str = html.unescape(price_str)
For the "add to cart" problem I edited:
file inventory-hunter/src/scraper/common.py adding an "or" condition.
if self.has_phrase('add to cart') or self.has_phrase('add to basket') or self.has_phrase('aggiungi al carrello'):
if tag and 'add to cart' in tag.text.lower():
Ok, this is what I've changed.
Dockerfile
ENV LC_ALL it_IT.UTF-8
ENV LANG it_IT.UTF-8
ENV LANGUAGE it_IT.UTF-8
src/scraper/common.py
line 33-34
price_str = tag if isinstance(tag, str) else tag.text.strip()
price_str = html.unescape(price_str).strip()
Don't forget to add the html import import html
Then I changed "add to cart" string to "aggiungi al carrello". Here you can concatenate or-conditions, if you want. But yesterday, when I tried this, I didn't think of that.
class GenericScrapeResult(ScrapeResult):
def parse(self):
# not perfect but usually good enough
if self.has_phrase('aggiungi al carrello') or self.has_phrase('add to basket'):
self.alert_subject = 'In Stock'
self.alert_content = self.url
src/scraper/amazon.py
line 24
if tag and 'aggiungi al carrello' in tag.text.lower():
Then rebuild the Docker image. I suggest you to use another tag for your Docker image, i.e. pipodi/inventory-hunter:latest, so you won't overwrite the original Docker image:
docker build -t {image/name:tag} .
then change the image name in docker_run.bash file:
line 20
default_image="{image/name:tag}"
Now you can run the docker_run.bash file with your configs.
I hope I've explained the changes well, if not just ask!
Thanks for the help! I did what you wrote, but I still have this on logs:
W2020-12-07 15:06:48,053 [mzn_t_3] unable to convert "389,00 €" to float... caught exception: could not convert string to float: '38900\xa0€
I don't care to overwrite so I did:
docker container ls -a
(to have the number of my old docker)
docker container rm 30bfdf575fe2
(to remove my old docker)
I edited the files:
And I ran:
docker pull ericjmarti/inventory-hunter:latest
/docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/********
After that I check the log:
docker logs -f amazon_rtx_3080
But I had the same error.
So I edited also line 18-19 in src/scraper/amazon.py:
price_str = self.set_price(tag)
price_str = html.unescape(price_str).strip()
but the same problem... I'm noob with docker and html.unescape so I did some error but I don't understand where...
Because you are running the wrong Docker image.
If you do
docker pull ericjmarti/inventory-hunter:latest
and
./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******
you are still using the original Docker image, without the ad-hoc edits we made.
So:
docker build -t {imagename} .
(where {imagename} can be something like metus88/inventory-hunter:latest. Be sure to type the period after the image name);ericjmarti/inventory-hunter:latest
with the image name you chose at the previous point;./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******
The reason you still having issues and it seems that it doesn't "save" your edits it's because Docker will use the original Docker image (the one without our edits), but we want it to use our image.
Hope this clarifies the situation.
You did a clear explanation! I have understood now. Thanks a lot. Now for me, it works!
Is it normal for a product that doesn't have a price because is not in stock this message in the log? example for this GPU: https://www.amazon.it/GeForce-GDDR6X-dissipatore-triventola-refresh/dp/B08HN37VQK
E2020-12-07 17:57:04,927 [mzn_t_1] caught exception during request: argument of type 'NoneType' is not iterable
E2020-12-07 17:57:04,929 [mzn_t_1] scrape failed
E2020-12-07 17:57:13,216 [mzn_t_2] caught exception during request: argument of type 'NoneType' is not iterable
E2020-12-07 17:57:13,219 [mzn_t_2] scrape failed
Because you are running the wrong Docker image.
If you do
docker pull ericjmarti/inventory-hunter:latest
and./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******
you are still using the original Docker image, without the ad-hoc edits we made.
So:
- Do the edits;
- Run
docker build -t {imagename} .
(where {imagename} can be something like metus88/inventory-hunter:latest. Be sure to type the period after the image name);- Edit the docker_run.bash at line 20, replace
ericjmarti/inventory-hunter:latest
with the image name you chose at the previous point;- Run
./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******
The reason you still having issues and it seems that it doesn't "save" your edits it's because Docker will use the original Docker image (the one without our edits), but we want it to use our image.
Hope this clarifies the situation.
Thanks for your explanation pipodi. Like Metus88, I also had problems converting the price from string to float because I am based in Singapore and the Amazon pricing looks like 'S$980'.
Therefore, what I did was to:
1) Edit the src/scraper/common.py file. I needed to replace the 'S' in front of the price with nothing so that the program can convert the price_str variable to float.
2) Build the image file again by running docker build -t {imagename} .
3) As I am running the program under Windows PowerShell, instead of editing docker_run.bash which you suggested, I edited docker_run.ps1 instead under line 6 to reflect the new imagename. The code I ran was:
.\docker_run.ps1 -Config .\config\(myconfigfile).yaml -Alerter discord -Webhook https://discord.com/api/webhooks/***
The program could run. When I checked the log, I could see that the conversion from string to float is no longer a problem. However, I got the error message "[root] There was an issue sending to discord due to an invalid status code back -> 401". In other words, even though there was an item in stock, it could not send an alert to discord.
I did not have any problem sending alerts to discord if I use the original Docker image, although the price conversion to float was there when using the original Docker image.
Just putting it out there for anyone who has similar issues. Any help would be appreciated.
@Metus88 Yeah, it should be.
@schlep83 401 is Unauthorized, are you sure that you have authorization to use the webhook?
Because you are running the wrong Docker image. If you do
docker pull ericjmarti/inventory-hunter:latest
and./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******
you are still using the original Docker image, without the ad-hoc edits we made. So:
- Do the edits;
- Run
docker build -t {imagename} .
(where {imagename} can be something like metus88/inventory-hunter:latest. Be sure to type the period after the image name);- Edit the docker_run.bash at line 20, replace
ericjmarti/inventory-hunter:latest
with the image name you chose at the previous point;- Run
./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******
The reason you still having issues and it seems that it doesn't "save" your edits it's because Docker will use the original Docker image (the one without our edits), but we want it to use our image. Hope this clarifies the situation.
Thanks for your explanation pipodi. Like Metus88, I also had problems converting the price from string to float because I am based in Singapore and the Amazon pricing looks like 'S$980'.
Therefore, what I did was to:
1. Edit the src/scraper/common.py file. I needed to replace the 'S' in front of the price with nothing so that the program can convert the price_str variable to float. 2. Build the image file again by running `docker build -t {imagename} .` 3. As I am running the program under Windows PowerShell, instead of editing docker_run.bash which you suggested, I edited **docker_run.ps1** instead under line 6 to reflect the new imagename. The code I ran was: `.\docker_run.ps1 -Config .\config\(myconfigfile).yaml -Alerter discord -Webhook https://discord.com/api/webhooks/***`
The program could run. When I checked the log, I could see that the conversion from string to float is no longer a problem. However, I got the error message "[root] There was an issue sending to discord due to an invalid status code back -> 401". In other words, even though there was an item in stock, it could not send an alert to discord.
I did not have any problem sending alerts to discord if I use the original Docker image, although the price conversion to float was there when using the original Docker image.
Just putting it out there for anyone who has similar issues. Any help would be appreciated.
Same here... I have done exactly all your modification for amazon.it and now I have the same error message
@Metus88 Yeah, it should be.
@schlep83 401 is Unauthorized, are you sure that you have authorization to use the webhook?
@Pipodi I think so. Like I said earlier, when I use the original Docker image, the alert to Discord works fine. That would imply I have authorization to use the webhook, no? However, when I use the docker image which I build with my modifications, it could not work.
@schlep83 I tried right now with my repo fork with the edits we discussed above and I receive the alerts on my Discord channel. What did you edit?
I close the issues becouse Discord problem is off topic for this issue. However I can confirm that the webhook for discord works great with the standard version and also with the mod version from @Pipodi .
@Pipodi Do your changes work for amazon.co.uk?
It should. Later I will try and check for fr, es and de. I will check for co.uk too.
Thank you. Appreciate it!
@jamesearls
I2020-12-08 19:34:18,379 [mzn_c_k_1] scraper initialized for https://www.amazon.co.uk/Corsair-Hydro-FOUNDERS-Water-Block/dp/B08NCSXYVX/ref=sr_1_7?dchild=1&keywords=rtx+3080&qid=1607455519&sr=8-7
I2020-12-08 19:34:18,380 [mzn_c_k_2] scraper initialized for https://www.amazon.co.uk/Gigabyte-GeForce-Graphics-GV-N3080GAMING-OC-10GD/dp/B08HLZXHZY/ref=sr_1_4?dchild=1&keywords=rtx+3080&qid=1607455519&sr=8-4
I2020-12-08 19:34:32,893 [mzn_c_k_1] now in stock at 166.64!
E2020-12-08 19:34:41,488 [mzn_c_k_2] caught exception during request: 'NoneType' object has no attribute 'text'
The second one throws an exception probably because it has no price listed, but as you can see, it works. If you want to see my edits, check out my fork: https://github.com/Pipodi/inventory-hunter as I'm waiting for the maintainer to implement the internationalization framework (I do not code in Python, I'm a Java dev, so I don't know where to start. I'm just experimenting changes learning from what it is coded).
Obviously, you have to edit the ENV variables of the Dockerfile to en_GB.UTF-8
and rebuild the Docker image.
Because you are running the wrong Docker image.
If you do
docker pull ericjmarti/inventory-hunter:latest
and./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******
you are still using the original Docker image, without the ad-hoc edits we made.
So:
* Do the edits; * Run `docker build -t {imagename} .` (where {imagename} can be something like metus88/inventory-hunter:latest. Be sure to type the period after the image name); * Edit the **docker_run.bash** at line 20, replace `ericjmarti/inventory-hunter:latest` with the image name you chose at the previous point; * Run `./docker_run.bash -c ./config/amazon_rtx_3080.yaml -a discord -w https://discord.com/api/webhooks/*******`
The reason you still having issues and it seems that it doesn't "save" your edits it's because Docker will use the original Docker image (the one without our edits), but we want it to use our image.
Hope this clarifies the situation.
For people trying to make this solution work today, DO NOT edit line 20 of docker_run.bash, edit line 21. It should be something like this:
default_image="ericjmarti/inventory-hunter:latest"
image="{your/imagename}"
If these two fields share the same value, a docker pull command will be executed, which will lead to "requested access to resource denied" errors.
At least, this is how I managed to make it work.
I read about change Dockerfile but with euro there is a problem on amazon France Italy Germany and all the country with euro€value: The price contain strange characters 19,99 € example:
<span id="price_inside_buybox" class="a-size-medium a-color-price"> 19,99 € </span>
And the program tells me that it's not able to convert in float the string of the price.
I edited
en_US.UTF-8
toit_IT.UTF-8
or fr or gr but it didn't work.Someone managed to make it work with € ? Thanks