Only grabs top 2 grids, would be awesome if it grabbed everything

seanmc9 commented 3 years ago

I would like to be able to scrape all of the cards on a page for Id information, but I found that as currently implemented, this scraper only actually grabs the top 2-6 with this line in OpenseaScraper.js const cardsNodeList = document.querySelectorAll(".Asset--anchor");. This works great for finding the floor, but what if I wanted to grab all of the cards, would this be possible? I tried a couple other options like querySelectorAlling on '[role="gridcell"]' but ultimately found no improvement - would you have any ideas on why it isn't able to grab the whole page worth of cards? Thanks so much!

dcts commented 3 years ago

Hey, how many cards would you want to grab? Because the page dynamically loads the items and replaces the old ones. You can see that behavior if you check the DOM and scroll... Heres a screenshare of the DOM during scrolling: gif-scrolling

Hope you can see that the amount of elements within the grid (all elements with role="gridcell") dynamically adapts. New elements get added, and more importantly, older elements get removed. So even If you scroll through the whole collection and run your query selector operation at the end you would olny get the last elements unfortunately.

The way I would implement that is:

get the parent component, in our case that would be the grid => document.querySelector(".AssetsSearchView--assets")
create a Mutation observer on that element
with the mutation observer you can implement a callback function that triggeres, everytime the element changes. So when a new element gets added you can extract the info you need
when the mutation observer is set, you would need to manually implement the scrolling behaviour

Its not the most difficult task but also not that trivial. Hope that helps!

seanmc9 commented 3 years ago

Wow, thank you for such a thorough response!! I'll definitely have to check this out, thank you so much!

Jethro87 commented 3 years ago

@seanmc9 Have you implemented this functionality? Just curious as I'm considering taking a stab at it. Thanks!

seanmc9 commented 3 years ago

Hey, sorry no I haven't - would love to follow along with what you do though, will you be implmenting it on this repo?

Jethro87 commented 3 years ago

@seanmc9 Not sure. I want to grab both the prices and token_id of n assets, so it would have additional functionality. Still looking at options.

dcts commented 3 years ago

@Jethro87 I like the idea to return more information, if you want to implement it, feel free, I can also help. I have implemented a scraper for the rankigs with scroll functionality (if you want to fetch more than the first 5 items) that works without having a mutations observer.

I would suggest to have another function (for example) OpenseaScraper.getOffers(slug, n) that returns an array of items, for example: ccnft

const items = await OpenseaScraper.getOffers("cool-cats-nft", 3);
// and items being an array with objects, see below 👇
[
  {
    tokenId: 5265,
    price: {
      currency: "ETH",
      amount: 7.82
    }
  },
  {
    tokenId: 2878,
    price: {
      currency: "ETH",
      amount: 7.9
    }
  },
  {
    tokenId: 6218,
    price: {
      currency: "ETH",
      amount: 7.9
    }
  }
]

Jethro87 commented 3 years ago

@dcts Thanks, that's really helpful! That's precisely the functionality I'm looking for. Could you point me in the right direction on how to solve scrolling for n items? I investigated yesterday, but this is my first foray into scraping so it was very slow going. Appreciate it.

dcts commented 3 years ago

The mechanism I would recommend is the following:

creating results object to store all scraped items
creating function scrapeItems, which scrapes all items that are currently visible on the page. Each item contains a unique tokenId and a price.
after page is fully loaded, running scrapeItems and storing all results in the dictionalry, using the unique tokenId as a key.
scrolling down by 100px (with window.scrollBy(0,100))
running scrapeItems again and updating the results object (if a tokenId already exsists, ignoring the result)
repeat until either end of page reached or results object contains n entries

Heres an idea how this could look like:

const resultDict = {};
const n = 10; // target items to fetch
// @todo: implement scrapeItems() and save results to resultDict
var currentScrollTop = -1; // initialize scroll position, to track when end of page is reached

const interval = setInterval(() => {
  console.log("another scrol... dict.length = " + Object.keys(resultDict).length);
  window.scrollBy(0, 100);
  // @todo: run scrapeItems() again and update resultDict

  const endOfPageReached = document.documentElement.scrollTop === currentScrollTop
  const enoughItemsFetched = Object.keys(resultDict).length > n;
  if(endOfPageReached || enoughItemsFetched) { // 
    clearInterval(interval);
    console.log("🥳 End Reached. ResultDict:");
    console.log(resultDict); // show scraped items
  } else {
    currentScrollTop = document.documentElement.scrollTop; // update current scroll position
  }
}, 200);

If you copy past the above code in the devtools console inside any browser on any page you should see the page automatically scrolling. Like this: demo-scrolling

dcts commented 3 years ago

I'm working on an update, will deploy a minimal version by ~tomorrow.

Jethro87 commented 3 years ago

Awesome, looking forward to it

dcts commented 3 years ago

I published an update with more functionality as discussed here. The new functions are documented in the main README file of this repository. Let me know if it works or if anything is unclear and I'm happy to help.

Closing this issue, please open a new one if you have any issues with the new functionality :)

Jethro87 commented 3 years ago

@dcts This is great, thanks! There was a small bug that wouldn't allow me to run it that I've fixed via PR. I've also added a new function to the PR here -> https://github.com/dcts/opensea-scraper/pull/7

alex-pcln commented 2 years ago

@dcts has this functionality been removed? I find it very useful to be able to scroll down and get more results.

dcts commented 2 years ago

@alex-pcln how many items do you need from a given page? Because in the current version of the scraper you get the top 32 offers. Simply run:

let result = await OpenseaScraper.offers("cool-cats-nft");

If you need more than the top 32 offers you might want to downgrade to v2 or similar. If you want I can lookup the exact version if thats usefull to you. But I am wondering in what use case you would need more than 32 offers? 🤔

alex-pcln commented 2 years ago

Thanks @dcts I downgraded to commit e01fea03e3 where it is still there.

Yes I need more than 32, actually I need all assets of a collection that are currently on sale. Unfortunately, it seems that I cannot get that from their API, so I'm using your scrapper. I just need to do some changes to detect the currency.

dcts commented 2 years ago

Well if you need more than 32 items we should bring that functionality back, and maybe call the function offersByScrolling. If thats usefull to you I don't see any downside. I can add it back in.

alex-pcln commented 2 years ago

That would be great, thanks. It's not the most efficient way to get all assets for sale but I couldn't find any other way at the moment.

dcts commented 2 years ago

@alex-pcln which of the following methods do you need?

offersByScrolling : you input the slug of the collection you want to fetch, for example cool-cats-nft.
offersByScrollingByUrl : you input the url that points to a filtered collection, for example https://opensea.io/collection/boredapeyachtclub?search[sortAscending]=true&search[sortBy]=PRICE&search[stringTraits][0][name]=Background&search[stringTraits][0][values][0]=Purple&search[stringTraits][1][name]=Earring&search[stringTraits][1][values][0]=Silver%20Hoop&search[stringTraits][2][name]=Eyes&search[stringTraits][2][values][0]=Bloodshot&search[toggles][0]=BUY_NOW

dcts commented 2 years ago

migrated to issue #29

dcts / opensea-scraper

Only grabs top 2 grids, would be awesome if it grabbed everything #2