Closed svanschalkwyk closed 4 years ago
Isn't this a duplicate of #683?
It works when I try. I get:
p_name = Fresh Raspberries, 12 oz
Please share your full config and a URL to reproduce.
My http crawler is exhibiting strange behaviour. Tags that were extracted a short while ago, fail to extract when new extractions are added below them. Is it possible for you to take a look at what I have please? https://gist.github.com/svanschalkwyk/6ae9dfe8db317495078e2b573efaa78a
Pascal, how to parse the $1.98 price from the snippet below? Mine is not working.
<dom toField="p_price" selector=".Price__salePriceItemDetail___3zgGh" extract="outerHtml" />
out of "class="Price__salePrice___3YEJa
Price__salePriceItemDetail___3zgGh"
data-automation-id="salePrice">$1.98</span>"
I've tried span.PricesalePrice3YEJa.PricesalePriceItemDetail3zgGh and .PricesalePrice3YEJa.PricesalePriceItemDetail3zgGh and .PricesalePriceItemDetail3zgGh and [class=... and [class^=PricesalePriceItemDetail3zg ...
from
<div class="ProductPage__priceContainer___3WJNU">
<div class="ProductPage__groceryPriceOGContainer___iqCCL">
<span class="Price__containerPrice___qhIUd">
<span class="screenReaderOnlyText">1 dollar and 98 cents</span>
<span aria-hidden="true" class="Price__salePrice___3YEJa Price__salePriceItemDetail___3zgGh" data-automation-id="salePrice">$1.98</span>
</span>
<sub class="displayConditionPrice"></sub>
</div>
<div class="ProductPage__priceRightInfoNoOldPrice___3frai" data-automation-id="weighted-display-condition">each</div>
<div class="ProductPage__priceRightInfoNoOldPrice___3frai">
<span class="screenReaderOnlyText"></span><span class="screenReaderOnlyText">unit price is 1 dollar and 98 cents per pound</span>
<span aria-hidden="true" data-automation-id="price-per-unit" class="ProductPage__pricePerUnitOG___7fJIU">$1.98/LB</span><span aria-hidden="true" class="ProductPage__hideOnDOM___2P0GJ"></span></div></div>
Can it be related to https://github.com/Norconex/collector-http/issues/683#issuecomment-617951754?
From the Gist you attached the HTML page does not seem to have that HTML content. So it looks like it is being pulled dynamically via JavaScript.
It may be something with PhantomJS. I changed the classes to the classes in the downloaded files, and now most work. Some content appearing in the Chrome debugger does not appear in the downloaded file.
This configuration has to ideally return "Fresh Raspberries...", but I cannot get anything out.
What am I doing wrong? Some selectors do seem to work, others not.