dcts / opensea-scraper

Scrapes nft floor prices and additional information from opensea. Used for https://nftfloorprice.info
MIT License
184 stars 73 forks source link

[BUG] `offersByScrolling()` and `offersByScrollingByUrl()` not properly working #36

Open dcts opened 2 years ago

dcts commented 2 years ago

I noticed that the function offersByScrolling() and offersByScrollingByUrl() is not working properly. Most of the offers are not scraped (a lot of them are skipped for some reason, approximately 75% of the offers are not saved). This leads to the function being stuck for a long time, as it takes a lot longer to scrape the desired amount of offers when 75% of the offers are not scraped.

dcts commented 2 years ago

If anyone experiences this too and relies on this function please comment below so I know its urgent πŸ“

SKreutz commented 2 years ago

This is exactly the problem I'm running into right now. should I close the other issue? I didn't find a fix until now but I'll also keep looking into this. It also still doesn't occur when choosing "total_volume" instead of the other options.

dcts commented 2 years ago

Oh yeah you're right, somehow I didn't realize this is the same bug than you reported, I randomly noticed it during testing. Closing the other issue #34 as its the same.

dcts commented 2 years ago

@SKreutz do you need to scrape multiple pages or are the first 100 sufficient? Because there is a way of getting the top 100 elements without scrolling, just run this script:

const nextDataStr = document.getElementById("__NEXT_DATA__").innerText;
const nextData = JSON.parse(nextDataStr);
const top100 = nextData.props.relayCache[0][1].json.data.rankings.edges.map(obj => obj.node);

This is way faster and more efficient than scrolling and scraping the data from the DOM. I will integrate this in the repository soon and add the following functions:

OpenseaScraper.rankings("24h"); // https://opensea.io/rankings?sortBy=one_day_volume
OpenseaScraper.rankings("7d"); // https://opensea.io/rankings?sortBy=seven_day_volume
OpenseaScraper.rankings("30d"); // https://opensea.io/rankings?sortBy=thirty_day_volume
OpenseaScraper.rankings("total"); // https://opensea.io/rankings?sortBy=total_volume

// ❌ currently not working: scrape more than 100 items from rankings page
OpenseaScraper.rankingsByScrolling(); 
SKreutz commented 2 years ago

@dcts I only want to scrape the first 100 slugs yes. Where do I put the 3 lines of code you provided? Thank you for your help I really appreciate it!

dcts commented 2 years ago

@SKreutz I added this new method and updated the repository, just update to the latest version 6.0.0 and then you can do:

// scrape all slugs, names and ranks from the top collections from the rankings page
// "type" is one of the following:
// "24h": ranking of last 24 hours: https://opensea.io/rankings?sortBy=one_day_volume
// "7d": ranking of last 7 days: https://opensea.io/rankings?sortBy=seven_day_volume
// "30d": ranking of last 30 days: https://opensea.io/rankings?sortBy=thirty_day_volume
// "total": scrapes all time ranking: https://opensea.io/rankings?sortBy=total_volume
const type = "24h"; // possible values: "24h", "7d", "30d", "total"
const ranking = await OpenseaScraper.rankings(type, options);
SKreutz commented 2 years ago

@dcts your fix seems to work fine! Really appreciate your help. It's even a lot faster than before. This bug can be closed.

mlarcher commented 2 years ago

How come the issue has been closed ? Has the OpenseaScraper.offersByScrolling() method been fixed ?

It seems to me that the issue first expressed in this ticket is still happening, but you found a workaround for the rankings case. Is there something I am not interpreting correctly ?

mlarcher commented 2 years ago

Not sure it is the same issue, but when running our script we get "stats":{"totalOffers":416} even though the offers field only contains 410 elements after calling scraper.offersByScrolling when running the script locally. In production on GCP, we get an empty result that lookis like

offers: []
stats: {}

Something is definitely wrong with this method... What can we do to help investiguate the issue?

dcts commented 2 years ago

@mlarcher I just checked and yes, you are absolutely right, the issue was never resolved. Thanks for reporting!

I need to take a closer look at the code, something happend that broke the code.

SKreutz commented 2 years ago

I just tried to repoduce the issue. When I try to check for example "slotienft" with currently 390 items on "buy now" and using the offers method works fine:

=== actions === new page created opening url https://opensea.io/collection/slotienft?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW 🚧 waiting for cloudflare to resolve... extracting wired variable closing browser... extracting offers and stats from wired variable total Offers: 390 top 3 Offers [ { name: 'Slotie #4606', tokenId: '4606', displayImageUrl: 'https://lh3.googleusercontent.com/6YxBtVI9cA4Y2kEMujrGodnXk55lEiJXRCdLDnGbwQRmpBI26Va7_BU7tmBvWYJz1YQz1lwGRuCZP_UtKHndL14Zj4qXwpy-Jfc8', assetContract: '0x5fdb2b0c56afa73b8ca2228e6ab92be90325961d', offerUrl: 'https://opensea.io/assets/0x5fdb2b0c56afa73b8ca2228e6ab92be90325961d/4606', floorPrice: { amount: 0.685, currency: 'ETH' } . . .

Scraping offers by scrolling also works fine for me.

βœ… === OpenseaScraper.offersByScrolling(slug, 40) === === scraping started === Scraping Opensea URL: https://opensea.io/collection/slotienft?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW

=== options === debug : false logs : true browserInstance: default

=== actions === new page created 🚧 waiting for cloudflare to resolve expose all helper functions scrape offers until target resultsize reached or bottom of page reached closing browser... total Offers: 390 all scraped offers (max 40): [

I also tried different collections. Everything works fine for me. I am using Mac OS Monetery 12.0.1 and Node v16.13.1. I also just downloaded the latest version of opensea scraper

Let me know if you need further information

mlarcher commented 2 years ago

Here's what I get:

server_1       | 2022-03-17T22:00:43.174Z debug: Start scraping prices
server_1       | === scraping started ===
server_1       | Scraping Opensea URL: https://opensea.io/collection/chumbivalleyofficial?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW
server_1       |
server_1       | === options ===
server_1       | debug          : false
server_1       | logs           : true
server_1       | browserInstance: default
server_1       |
server_1       | === actions ===
server_1       | new page created
server_1       | 🚧 waiting for cloudflare to resolve
server_1       | expose all helper functions
server_1       | scrape offers until target resultsize reached or bottom of page reached
server_1       | closing browser...
server_1       | 2022-03-17T22:11:17.853Z debug: Prices scraping done [{"foundOffersCount":408,"stats":{"totalOffers":412}}]

I'm on MacOS Monterey 12.3 in a docker container running node:16.14.0-alpine3.14

dcts commented 2 years ago

@mlarcher I published a fix, can you test and let me know if it works now, be sure to use version 6.0.2 :)

dcts commented 2 years ago

@SKreutz thanks for testing! I think it might have looked like everything works on your end, but in fact a lot of the offers were missing when using the offersByScrolling method. The bug was that 80% of the offers were skipped, only ~20% got scraped. This is particularly bad because sometimes it might seem that everything works, whereas it actually did not. And other times it just broke.

But now it should be fixed, at least the demo is working again (for me) with all relevant offers scraped. You can test it with

npm run demo
mlarcher commented 2 years ago

@dcts it's @SKreutz who said "Scraping offers by scrolling also works fine for me" not me... I just tested the 6.0.2 version, I got [{"foundOffersCount":412,"stats":{"totalOffers":413}}] so one offer is still missing in the offers array. I'm running it a second time to be sure, but I see 413 on opensea right now, so thre's probably still something going on.

mlarcher commented 2 years ago

second run got me [{"foundOffersCount":405,"stats":{"totalOffers":413}}] so we're not good yet :/

mlarcher commented 2 years ago

also, is there any chance it works on GCP with current version, or is it an unrelated problem that I get empty results in production ?

dcts commented 2 years ago

@mlarcher can you post what collection you scraped that got you these results?

@dcts it's @SKreutz who said "Scraping offers by scrolling also works fine for me" not me... I just tested the 6.0.2 version, I got [{"foundOffersCount":412,"stats":{"totalOffers":413}}] so one offer is still missing in the offers array. I'm running it a second time to be sure, but I see 413 on opensea right now, so thre's probably still something going on.

mlarcher commented 2 years ago

here it is @dcts :

server_1 | Scraping Opensea URL: https://opensea.io/collection/chumbivalleyofficial?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW

dcts commented 2 years ago

When I run the following:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 40, options);

I get correct results, in fact, they are identical to running OpenseaScraper.offers("chumbivalleyofficial",options).

Can you try to run it locally (not on GCP)?

also, is there any chance it works on GCP with current version, or is it an unrelated problem that I get empty results in production ?

To answer your question: yes, its an unrelated problem that has nothing to do with the scraper, but with the environment. Cloud setups for scraping are always difficult because you don't have full control over the environment, ips etc. Also services like cloudflare can detect a cloud environment (through IP lists) and handle them differently (block them). See issues #40 #39. In case I find a solution for the cloud I will certainly share, but as of now I don't plan to work on that. But I encourage everybody to share working cloud setups, because it is a common thing that certainly a lot of people would like.

mlarcher commented 2 years ago

@dcts thanks for the information.

GCP is not at stake here, as we have absolutely no result at all there (even if it used to work at some point before). I'll check if I can do anything to change the script's external ip.

What I was giving are results in a docker container on my machine.

Your test got me thinking, and I tried directly on the host machine with no docker container involved and got the same issue : [{"foundOffersCount":419,"stats":{"totalOffers":422}}]

In your test you are limiting the results to 40, which is a way of avoiding the issue, but we want a way larger result set. There are about 420 items on sell, not 40... Maybe you could try on your machine with a limit set at 500 ?

Please let me know what else we can do to help investigate the issue.

dcts commented 2 years ago

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? πŸ€”

I think some offers don't get fetched because of how the scraping algorithm is designed:

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

SKreutz commented 2 years ago

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? πŸ€”

I think some offers don't get fetched because of how the scraping algorithm is designed:

  • the algorithm keeps scrolling as long as possible
  • scrolling triggers fetching of new data, which changes the DOM
  • then the algorithm gets the data from the DOM This is obviously not a great design, as its very error prone. What if the DOM is being checked before the data has been inserted? or what if the fetching fails? In those cases the algorithm would simply skip and continue.

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

I also thinks it’s not possible to fetch 100% because of the way opensea uses to display the items and as you mentioned the DOM changes. When scrolling manually and looking at the html, the DOM changes and adds the elements as they appear. Sometimes opensea is very slow or the nfts are gifs instead of jpegs which takes even longer and I think that’s why some items are skipped.

The only way to β€žfixβ€œ this would in my opinion be to place a sleep of a few seconds after each β€žscrollβ€œ so the items have more time to display. But I don’t know how the code works exactly and even that would not be a nice solution and it would make the code slow.

mlarcher commented 2 years ago

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? πŸ€” yes, that's it when run locally or in the docker cotainer on my home machine. On GCP I get no result at all, but as we saw it's not the same issue.

The only way to β€žfixβ€œ this would in my opinion be to place a sleep of a few seconds after each β€žscrollβ€œ so the items have more time to display. Perhaps a timeout after the last scroll only somehow ?

I'll check if there is a better way to know when the DOM is "stabilized"...

mlarcher commented 2 years ago

perhaps you could use something like https://developer.mozilla.org/fr/docs/Web/API/MutationObserver to monitor dom changes, scroll, and debounce an ending function until nothing moves anymore ?

dcts commented 2 years ago

perhaps you could use something like https://developer.mozilla.org/fr/docs/Web/API/MutationObserver to monitor dom changes, scroll, and debounce an ending function until nothing moves anymore ?

@mlarcher Yes this is a good idea, I tried this at some point but could not make it work, maybe worth a revisit.

Also what could be even more efficient is scrolling and simply controling puppeteer network activity, like this:

// taken from => https://stackoverflow.com/a/55478226/6272061
page.on('response', (response) => {
    const headers = response.headers();

    // example test: check if content-type contains javascript or html
    const contentType = headers['content-type'];
    if (textRegex.test(contentType)) {
        console.log(response.url());
    }
});

Once new data needs to be fetched the graphql API is called and when we intercept that request we get the data in this format:

{
    "node": {
        "assetCount": null,
        "imageUrl": "https://lh3.googleusercontent.com/seJEwLWJP3RAXrxboeG11qbc_MYrxwVrsxGH0s0qxvF68hefOjf5qrPSKkIknUTYzfvinOUPWbYBdM8VEtGEE980Qv2ti_GGd86OWQ=s120",
        "name": "DeadFellaz",
        "slug": "deadfellaz",
        "isVerified": true,
        "id": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
        "description": "10,000 undead NFTs on the Ethereum blockchain. Join the horde.\n\nAdditional official collections:\n\n[Halloween S1](https://opensea.io/collection/deadfellaz-infected-s1) | [Nifty Gateway Betty Pop Horror](https://opensea.io/collection/betty-pop-horror-by-deadfellaz) | [Deadfrenz Lab Access Pass](https://opensea.io/collection/deadfrenz-lab-access-pass) | [Deadfrenz Collection](https://opensea.io/collection/deadfrenz-collection)"
    }
}

Bildschirmfoto vom 2022-03-19 11-39-36

I think thats a nice solution and should be fairly easy to develop πŸŽ‰ Added it to the roadmap πŸš”!

Side note: At that point it might be worth trying to use the opensea graphQL api but I never could make it work and I heard from people that its a pain to use.

dcts commented 2 years ago

Ups just realized that I posted the collection information above, the information for every single item (offer) looks like this:

{
  "assetContract": {
    "address": "0x2acab3dea77832c09420663b0e1cb386031ba17b",
    "chain": "ETHEREUM",
    "id": "QXNzZXRDb250cmFjdFR5cGU6MzAyOTQ1",
    "openseaVersion": null
  },
  "collection": {
    "isVerified": true,
    "relayId": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
    "id": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
    "displayData": {
        "cardDisplayStyle": "CONTAIN"
    },
    "imageUrl": "https://lh3.googleusercontent.com/seJEwLWJP3RAXrxboeG11qbc_MYrxwVrsxGH0s0qxvF68hefOjf5qrPSKkIknUTYzfvinOUPWbYBdM8VEtGEE980Qv2ti_GGd86OWQ=s120",
    "slug": "deadfellaz",
    "isAuthorizedEditor": false,
    "name": "DeadFellaz"
  },
  "relayId": "QXNzZXRUeXBlOjM2Nzg2ODY0",
  "tokenId": "3036",
  "backgroundColor": null,
  "imageUrl": "https://lh3.googleusercontent.com/RQlR9mw-oJyhrj_GtwRZfRJdqk-fjtbJK4tElqpas4R1XksLXqnklhvnbw40LHsVliYoDO3z9rWE7OczRKp_qhDqSS_ZNzyRa9kG",
  "name": "DeadFellaz #3036",
  "id": "QXNzZXRUeXBlOjM2Nzg2ODY0",
  "isDelisted": false,
  "animationUrl": null,
  "displayImageUrl": "https://lh3.googleusercontent.com/RQlR9mw-oJyhrj_GtwRZfRJdqk-fjtbJK4tElqpas4R1XksLXqnklhvnbw40LHsVliYoDO3z9rWE7OczRKp_qhDqSS_ZNzyRa9kG",
  "decimals": 0,
  "favoritesCount": 23,
  "isFavorite": false,
  "isFrozen": false,
  "hasUnlockableContent": false,
  "orderData": {
    "bestAsk": {
      "relayId": "T3JkZXJWMlR5cGU6MzUyMjU2ODkzMQ==",
      "orderType": "BASIC",
      "maker": {
        "address": "0x28705f64c07079822c7afd66e43975b7c6095ef6",
        "id": "QWNjb3VudFR5cGU6MTQ1NjA1MTQy"
      },
      "closedAt": "2022-04-05T05:44:18",
      "dutchAuctionFinalPrice": null,
      "openedAt": "2022-03-17T21:48:42",
      "priceFnEndedAt": null,
      "quantity": "1",
      "decimals": null,
      "paymentAssetQuantity": {
        "quantity": "2690000000000000000",
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/6f8e2979d428180222796ff4a33ab929.svg",
          "symbol": "ETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0x0000000000000000000000000000000000000000",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzMQ=="
          },
          "id": "QXNzZXRUeXBlOjEzNjg5MDc3"
        },
        "id": "QXNzZXRRdWFudGl0eVR5cGU6Mjg3MDE4NzA3OTcyNTgyMjM1NjM1NTg1MDc0MTcxNjgyNzE3ODc4",
        "quantityInEth": "2690000000000000000"
      }
    },
    "bestBid": {
      "orderType": "BASIC",
      "paymentAssetQuantity": {
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/accae6b6fb3888cbff27a013729c22dc.svg",
          "symbol": "WETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzOA=="
          },
          "id": "QXNzZXRUeXBlOjQ2NDU2ODE="
        },
        "quantity": "1502841336452599400",
        "id": "QXNzZXRRdWFudGl0eVR5cGU6MjEzNTc0NjA3Mzk2MzM3NzU2NjY4MTkxMzczOTUxNTUwMzAwMDE0"
      }
    }
  },
  "isEditable": {
    "value": false,
    "reason": "Unauthorized"
  },
  "isListable": true,
  "ownership": null,
  "creator": {
    "address": "0xe9d30eddd11dea8433cf6d2b2c22e9cce94113dc",
    "id": "QWNjb3VudFR5cGU6NjEyNTkxNTA="
  },
  "ownedQuantity": null,
  "assetEventData": {
    "lastSale": {
      "unitPriceQuantity": {
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/6f8e2979d428180222796ff4a33ab929.svg",
          "symbol": "ETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0x0000000000000000000000000000000000000000",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzMQ=="
          },
          "id": "QXNzZXRUeXBlOjEzNjg5MDc3"
        },
        "quantity": "1300000000000000000",
        "id": "QXNzZXRRdWFudGl0eVR5cGU6MjQxMDUyNDMxOTA1OTU2ODY0MDMxNjQ3MTYzMjQyMzYyNTQ4MTkw"
      }
    }
  }
}
mlarcher commented 2 years ago

@dcts hooking into the graphql API sounds like a wonderful idea. It could drastically improve the performance and avoid some DOM related pitfalls πŸ‘

mlarcher commented 2 years ago

Side note: At that point it might be worth trying to use the opensea graphQL api but I never could make it work and I heard from people that its a pain to use.

Using the API would be nice, but from what I heard they don't give API tokens very easily, and even if granted an API Key you would be facing some limits/restrictions.

Also it seems the query they use on the site is not documented (AssetSearchQuery) and it requires an API key and a CSRF token that changes on every call, so I can see why it could be a pain to use...

using page.on('response', (response) => { sounds great though, as it would combine the best of both worlds. Any idea when you'll have time to give it a go ?

dcts commented 2 years ago

@mlarcher I'm working on it currently but not sure, depending on how long it will take to implement it could be today or next weekend maybe. But obviously no guarantees. ^^

mlarcher commented 2 years ago

great to read πŸ‘ I'm looking forward to see it. Let me know if I can do anything to help

dcts commented 2 years ago

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? thinking

I think some offers don't get fetched because of how the scraping algorithm is designed:

  • the algorithm keeps scrolling as long as possible
  • scrolling triggers fetching of new data, which changes the DOM
  • then the algorithm gets the data from the DOM This is obviously not a great design, as its very error prone. What if the DOM is being checked before the data has been inserted? or what if the fetching fails? In those cases the algorithm would simply skip and continue.

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

@mlarcher I just found out that Opensea has a bug in their display of number of offers. The number they display on the page does not match the actual nfts displayed. For example check this page: https://opensea.io/collection/deadfellaz?search[sortAscending]=true&search[sortBy]=PRICE&search[stringTraits][0][name]=Background&search[stringTraits][0][values][0]=Blue&search[stringTraits][1][name]=Body%20Grade&search[stringTraits][1][values][0]=Fresh&search[toggles][0]=BUY_NOW

opensea says that there are 76 items for sale, but if you count the nfts by scrolling down the page you will find that theres only 75 (obviously this can change but I'm pretty confident that it is a consistent bug).

So I think the scraping currently is working as it should, although scrapingByScrolling is not very efficient.

dcts commented 2 years ago

(side note: I'm still gonna publish a v7 very soon with more efficient scrolling, as I already built it and like the architecture way better)

mlarcher commented 2 years ago

(side note: I'm still gonna publish a v7 very soon with more efficient scrolling, as I already built it and like the architecture way better)

I'm looking forward to try it out !! 🀩

About your other point, the collection currently says 78 items and effectively lists them all, but I believe there can be a bug on their side there. There were never a big offset, so I'm fine leaving it at that πŸ‘πŸ»

mlarcher commented 2 years ago

Any ETA for the new version by any chance? I'm eager to try it 😊

dcts commented 2 years ago

I have a working implementation with the new algorithm but its not stable, so I won't publish it. I can share my work in a seperate dev branch if you like.

mlarcher commented 2 years ago

I'd be interested in taking a look at it. Also, what's not stable ? Is there anything I can do to help ?

mlarcher commented 2 years ago

@dcts Any news ? FYI we now have our scraping job on GCP stuck on "scrape offers until target resultsize reached or bottom of page reached" and never ending... I'd like to check if the new implementation works any better there

dcts commented 2 years ago

@mlarcher if you like check out the branch dev-improve-offersByScrolling. The new implementation sometimes works, but not stable as I mentioned. The autoscrolling part needs improvement. You can test the new version by running git fetch and then git checkout dev-improve-offersByScrolling on your local machine. And then use the new version with:

const result = await OpenseaScraper.offersByScrolling("deadfellaz", 100, options);

When you run the scraper on GCP, do the other functions work (for example, can you run OpenseaScraper.offers()?). If yes would be awesome if you could share your setup, I think a lot of people would be interested in that :)

mlarcher commented 2 years ago

I tried offers() in production on GCP and got

TypeError: Cannot read properties of undefined (reading 'split')
    at _parseWiredVariable (/app/node_modules/opensea-scraper/src/functions/offers.js:105:49)
    at offersByUrl (/app/node_modules/opensea-scraper/src/functions/offers.js:90:21)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Object.offers (/app/node_modules/opensea-scraper/src/functions/offers.js:37:10)

right after extracting __wired__ variable Is there a way to dump the html content for debugging ?

dcts commented 2 years ago

I would argue that if OpenseaScraper.offers() does not work on GCP there's no way that OpenseaScraper.offersByScrolling() will work (on GCP). So theres 2 problems here:

  1. making OpenseaScraper run on GCP
  2. making OpenseaScraper.offersByScrolling() work

Before takling 2 you need to figure our 1, otherwise theres no way to properly debug. The topic of this issue is 2 though.

dcts commented 2 years ago

You can get the HTML content from puppeteer with content() method:

const html = await page.content();
dcts commented 2 years ago

Lets move this conversation to issue #40 (moved your content over there)

zolmine commented 1 year ago

Hello y'all, here's an updated version of the offerByScrollingByUrl function: