doitlikejustin / amazon-wish-lister

Retrieve Amazon Wishlist and output to JSON, XML, or PHP Array Object
http://www.justinscarpetti.com/portfolio/amazon-wish-lister/
MIT License
399 stars 84 forks source link

Only getting first page of items #31

Open chris-n-self opened 6 years ago

chris-n-self commented 6 years ago

I'm on the UK amazon site, and if I view wishlists on the website it loads more items as you scroll down the page. The script is only returning the items that are displayed when the page first opens, not the hidden items that you have to scroll down to see.

chris-n-self commented 6 years ago

I dug through the html of the wishlist page a little bit and found this at the end of the items sections, <script type="a-state" data-a-state="{&quot;key&quot;:&quot;scrollState&quot;}">{"showMoreUrl": it's followed by a URL that if I copy and paste into my browser takes me to the 'second page' of items. Following the chain of pages, when you're at the end of the wishlist that field is missing from the html. I can imagine maybe it's possible to grab that URL and go to the next page to get the next items, repeating until the field is missing. But I don't understand how the phpQuery is working well enough to try it myself, without it taking a while.

chris-n-self commented 6 years ago

I got this working, the "showMoreUrl" uses a field in the url called last evaluated key &lek= to point to the next set of items. I can grab that from the current page with,

$next_page = pq('script[data-a-state=\'{"key":"scrollState"}\']')->html() preg_match("/\"lastEvaluatedKey\":\"(.*)\"/", $next_page, $next_page_url) $lek = $next_page_url[1]

The $lek really just work like pages, the items shown on each do not overlap with each other. So (in my local copy of the code) I've just replaced the loop over page_num with,

$lek = '' do {       $contents = phpQuery::newDocumentFile("$baseurl/registry/$amazon_id?...&lek=$lek");       ...       // get new value of $lek here } while(!$lek == '');

where I've just appended &lek=$lek onto the end of the url in the phpQuery::newDocumentFile call.

blobaugh commented 4 years ago

I am experiencing the same issue. I have verified that PR #34 does fix the issue for me.