Open stash86 opened 8 years ago
Currently this project is not under active development. The tl;dr is that I was unable to find good way around TCGplayer's restrictions on scraping and ended up abandoning the project for now due to time constraints. It's possible in the future I'll pick it up again, but for now it's halted.
I had a working tcg prices scraper befores they changed their website. I want to update it so...what are these restrictions you are talking about? Thanks
From TCG's policy website:
Don’t abuse our pricing information by using a site scraper, code, script, or other way of repeatedly taking shots at our servers. Respect the information and it will always flow freely. Try to corrupt it or take it for your own profit and use, and we’ll frown at you. Hard.
This is enforced (last time I checked) by Captcha lockouts if you make more than X requests per Y. Not sure what X/Y are, but they are set to something per IP address.
Actually, it goes further than that. They also filter out requests from automated HTTP clients, via checking headers/etc as well as requiring a javascript function to be executed before the data will display. (These are at least my working theories based on the work I had done) I believe it would be possible to do a method using some sort of true browser emulation, i.e. Selenium, but it would be slow/costly, and overall I think not worth investigating.
I've not yet investigated the eBay endpoint, but I'd imagine that one is at least fixable. I however am not sure how much use it is. It seemed like the TCGPlayer endpoint was far and away the most popular.
Selenium doesn't work actually, I tried it.
On Thu, Mar 9, 2017 at 4:18 PM, bedoherty notifications@github.com wrote:
Actually, it goes further than that. They also filter out requests from automated HTTP clients, via checking headers/etc as well as requiring a javascript function to be executed before the data will display. (These are at least my working theories based on the work I had done) I believe it would be possible to do a method using some sort of true browser emulation, i.e. Selenium, but it would be slow/costly, and overall I think not worth investigating.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bedoherty/MagicTCGPriceAPI/issues/11#issuecomment-285485980, or mute the thread https://github.com/notifications/unsubscribe-auth/AWPad6O_cvxpjfkl8FWuVseb0ZOLIizKks5rkGyKgaJpZM4J1Yod .
Well, thats a bummer. I suppose I could always reach out to TCG Player and see if they were interested in letting me bake their API into this project, although I'm not sure how well received it would be as I imagine projects like this were exactly the reason they put those expensive security measures into place.
I tried that before (asking for access), and their reply was something like "Unless you can benefit us financially, we won't grant you access"
My code doesn't work anymore either. I also tried selenium, no success. They are using Incapsula now...so I'll turn to another site for prices scrapping. It's bad because I was usually using it only once a month to evaluate my collection and to help my trading.
Which site is easier to scrap now?
On Thu, Mar 16, 2017 at 5:07 PM, Akarius notifications@github.com wrote:
My code doesn't work anymore either. I also tried selenium, no success. They are using Incapsula now...so I'll turn to another site for prices scrapping. It's bad because I was usually using it only once a month to evaluate my collection and to help my trading.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bedoherty/MagicTCGPriceAPI/issues/11#issuecomment-287191189, or mute the thread https://github.com/notifications/unsubscribe-auth/AWPadx9YaaXhiHnlrGJuGK-sZ7-pSsNlks5rmaSCgaJpZM4J1Yod .
It looks like CFB prices are still scrapable, and the API should still work for that. eBay I'm unsure, I'd have to look at it and update things most likely.
I do know someone is making a python code to scrap prices from mtggoldfish. We use it for Penny Dreadful format. I haven't look at the code yet, because I'm not familiar with python. Maybe those who know python can look at it?
Can we revive this project?
I think it depends on in what capacity we want to revive it. Honestly, if I was going to continue this project I'd probably skip the App Engine Python based backend and move it into a NodeJS based backend or something similar. Also, as far as features/functionality if you're looking to revive it with all the original set, thats unlikely. People have become adverse towards services like this hitting their servers, for obvious reasons, and have purchased very expensive countermeasures.
Basically what I see as reasonable:
That being said, theres truly nothing stopping you from forking your own version of this repo, making it bigger and better, and maybe even putting in a request to update the original.
I still have a working scraper built in Ruby...
And it works on TCG website @jrkarnes ?
Well, in the end I made a PHP script to scrap the price from mtggoldfish. Working fine so far.
@Akarius Yes. The scraper works through a browser automation and is not headless. I'm letting the browser process all the javascript and AJAX so that we only have to export the DOM and then walk through it with a parser.
First of all, thanks a lot for your API. This is great. Recently I used your API on my Android app to retrieve card images and CFB price.
Just wondering whether this API still on development, or on hiatus? Thanks