Dionakra / arenavision-scraper

Little ArenaVision Scraper to get the sports events
https://www.npmjs.com/package/arenavision-scraper
7 stars 1 forks source link

error in last version #5

Closed hannibal1986 closed 4 years ago

hannibal1986 commented 5 years ago

Thanks for your work

Dionakra commented 5 years ago

Hi,

Yes, Arenavision just put Cloudflare to protect the site, and now it recognises non-human interactions within the site. I have to research how to bypass this. The easiest (and costly) way would be scraping it with Puppeteer or something like that, but this consume lots of resources as it deploys a Google Chrome in your machine (and I have things running using this in a 5$ VPS, so that's not an option).

This weekend I will research some things to do, it looks like Arenavision is protecting a little bit from us!

And thank you for your comment!

Dionakra commented 5 years ago

Hi again,

I have been testing today that Puppeteer approach to get the data and it works, but Puppeteer launches a whole instance of Google Chrome for doing its job, so for doing the exact same thing it takes a while.

So, I think that I will modify the library and publish the modifications, but I am thinking on doing the information extraction myself and provide an API for everyone with the data already extracted. With this anyone with any language can query it and I already store the information in a DB, so maybe that's the solution to it.

Anyway, we will see this weekend.

hannibal1986 commented 5 years ago

thanks, great job

hannibal1986 commented 5 years ago

Today guide is back to the past with no img 🤣

Dionakra commented 5 years ago

Yeah, maybe they thought that if we were to get the info anyway, they just switched to Cloudflare to protect themselves from this thing.

At least this is funny, researching how to bypass all the things they put as protection.

Dionakra commented 5 years ago

I have uploaded a new version after holidays.

Now it works, but it depends in your IP I think. In my laptop works, but in neither in my server nor in TravisCI is working. Maybe they have banned some source IPs, I don't know. I will try to deploy it in a Now.sh server, just to see if I can access from there.

@hannibal1986 , could you install the new version and run a npm run test in your local just to test if it works elsewhere than in my laptop? New version is 1.0.31.

Thank you.

hannibal1986 commented 5 years ago

I have uploaded a new version after holidays.

Now it works, but it depends in your IP I think. In my laptop works, but in neither in my server nor in TravisCI is working. Maybe they have banned some source IPs, I don't know. I will try to deploy it in a Now.sh server, just to see if I can access from there.

@hannibal1986 , could you install the new version and run a npm run test in your local just to test if it works elsewhere than in my laptop? New version is 1.0.31.

Thank you.

works fine the new version, thanks for this big job

hannibal1986 commented 5 years ago

2 days working fine, and today this:

UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'attribs' of undefined at fetch.then.then.res (arenavision-scraper/src/getGuide.js:132:33) at processTicksAndRejections (internal/process/task_queues.js:86:5) (node:29456) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1) (node:29456) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Dionakra commented 5 years ago

Yep, just fixed in the version 1.0.32.

How long will it last? I don't know.

hannibal1986 commented 5 years ago

very thanks for all

Dionakra commented 5 years ago

I have uploaded a new version, 1.0.33, which just removes everything regarding the image-processing library, as it was causing some problems for the library to be deployed in Firebase Functions.

With that version of the library I have scheduled a function in Firebase Functions to extract everything and it is working. It takes 5 minutes or so because there is only 256 MB of RAM available in a Firebase Function but it does the trick and removes the need of a server and, maybe, they don' block Google IPs.

hannibal1986 commented 5 years ago

Is it possible to run arenavision scraper with useragent so as not to be banned by cloudflare?

Dionakra commented 5 years ago

It was the change I introduced with the referer in the header to make it work, but I can try, of course. I have been banned in 3 different servers up to date.

Dionakra commented 5 years ago

I cannot test to remove the UserAgent in any place because every place I have it deployed has been banned, so I think I will give up for now.

I will disable every cron jobs and, if within a month I can fix it, I will, but after two years struggling with the Arenavision guys (which they are doing the right thing) I am giving up for the moment.

hannibal1986 commented 5 years ago

thanks, very thanks for this great job

Dionakra commented 4 years ago

Closing this issue as the library is going to be deprecated. Please refer to #6