Closed Mervyen closed 4 years ago
I understand your issue. I will make it as an option.
Until then, have you tried using grouped=True
parameter?
results per group won't be unique.
I didn‘t have tried the group function。I would try this later。 thx!
I tried and it didnt work for me.
Please share your url or html content and your code for it so we can find the problem.
The website is "https://selected-cigars.com/en/partagas-serie-d-no-4" My wanted_list like ['Partagás - Serie D No. 4 1 piece','sold sout'].the store item just show once.
code:
from autoscraper import AutoScraper url = 'https://selected-cigars.com/en/partagas-serie-d-no-4' wanted_list = ['Partagás - Serie D No. 4 1 piece','Sold Out'] scraper = AutoScraper() result = scraper.build(url,wanted_list) print(result)
the output: ['Partagás - Serie D No. 4 1 piece', 'Partagás - Serie D No. 4 A/T 1Pc', 'Partagás - Serie D No. 4 A/T 3pcs', 'Partagás - Serie D No. 4 10pcs, wooden Box / 1 box per Customer', 'Partagás - Serie D No. 4 25pcs, wooden Box / 1 Box per Customer', 'Partagás - Serie D No. 4 A/T special 25er Metalltube', 'Sold Out', 'small quantities available']
In the output,the 'sold out ' item just show once.but in the web ,these items are more than 1.there are about 4 counts
I recommend to first use the grouped=True
parameter. After analyzing the output keep the desired rules by keep_rules
or remove_rules
methods. Then if you want to get the result list, use the unique=False
parameter.
THX
I was wondering how this works:
k: v if v != [] else '' for k, v in item.attrs.items() if k in key_attrs
I'm guessing it's shorthand for something. I didn't open a new issue as wanting to know how the code works doesn't seem like one.
It's creating a new dict from item.attrs
, containing only the keys which are present in key_attrs
. Also it is converting values of []
to ''
I‘m sorry to add this issue, I dont konw whether this is an issue.
In my code.I dont want to remove the duplicate result,and I had tried to commented out some code.But it seems doesn't work,so I add this issue.
sorry for this issue again.Pls tell me If this is not an issue,I will delete this.