Closed trivius86 closed 1 year ago
Ho scavalcato il problema attivando questa voce: USER_AGENT = 'sofifa (+http://www.yourdomain.com)'
in pratica lo script diceva al sito di sofifa di essere uno scrap e veniva bloccato alla prima pagina... ora però me ne restituisce un'altro dopo tot numero di giocatori scaricati (tra gli 800 e 1200 circa). Allego dato di errore ![Uploading Screenshot 2023-02-07 alle 09.20.15.png…]()
2023-02-07 09:19:55 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): sofifa.com:443
2023-02-07 09:19:55 [urllib3.connectionpool] DEBUG: https://sofifa.com:443 "GET /team/111144/seattle-sounders/?r=230012 HTTP/1.1" 429 None
2023-02-07 09:19:55 [scrapy.core.scraper] ERROR: Spider error processing <GET https://sofifa.com/players?set=true&col=oa&sort=desc&showCol%5B0%5D=pi&showCol%5B1%5D=ae&showCol%5B2%5D=hi&showCol%5B3%5D=wi&showCol%5B4%5D=pf&showCol%5B5%5D=oa&showCol%5B6%5D=pt&showCol%5B7%5D=bo&showCol%5B8%5D=bp&showCol%5B9%5D=gu&showCol%5B10%5D=jt&showCol%5B11%5D=le&showCol%5B12%5D=vl&showCol%5B13%5D=wg&showCol%5B14%5D=rc&showCol%5B15%5D=ta&showCol%5B16%5D=cr&showCol%5B17%5D=fi&showCol%5B18%5D=he&showCol%5B19%5D=sh&showCol%5B20%5D=vo&showCol%5B21%5D=ts&showCol%5B22%5D=dr&showCol%5B23%5D=cu&showCol%5B24%5D=fr&showCol%5B25%5D=lo&showCol%5B26%5D=bl&showCol%5B27%5D=to&showCol%5B28%5D=ac&showCol%5B29%5D=sp&showCol%5B30%5D=ag&showCol%5B31%5D=re&showCol%5B32%5D=ba&showCol%5B33%5D=tp&showCol%5B34%5D=so&showCol%5B35%5D=ju&showCol%5B36%5D=st&showCol%5B37%5D=sr&showCol%5B38%5D=ln&showCol%5B39%5D=te&showCol%5B40%5D=ar&showCol%5B41%5D=in&showCol%5B42%5D=po&showCol%5B43%5D=vi&showCol%5B44%5D=pe&showCol%5B45%5D=cm&showCol%5B46%5D=td&showCol%5B47%5D=ma&showCol%5B48%5D=sa&showCol%5B49%5D=sl&showCol%5B50%5D=tg&showCol%5B51%5D=gd&showCol%5B52%5D=gh&showCol%5B53%5D=gk&showCol%5B54%5D=gp&showCol%5B55%5D=gr&showCol%5B56%5D=tt&showCol%5B57%5D=bs&showCol%5B58%5D=wk&showCol%5B59%5D=sk&showCol%5B60%5D=aw&showCol%5B61%5D=dw&showCol%5B62%5D=ir&showCol%5B63%5D=pac&showCol%5B64%5D=sho&showCol%5B65%5D=pas&showCol%5B66%5D=dri&showCol%5B67%5D=def&showCol%5B68%5D=phy&r=230012&offset=780> (referer: https://sofifa.com/players?set=true&col=oa&sort=desc&showCol%5B0%5D=pi&showCol%5B1%5D=ae&showCol%5B2%5D=hi&showCol%5B3%5D=wi&showCol%5B4%5D=pf&showCol%5B5%5D=oa&showCol%5B6%5D=pt&showCol%5B7%5D=bo&showCol%5B8%5D=bp&showCol%5B9%5D=gu&showCol%5B10%5D=jt&showCol%5B11%5D=le&showCol%5B12%5D=vl&showCol%5B13%5D=wg&showCol%5B14%5D=rc&showCol%5B15%5D=ta&showCol%5B16%5D=cr&showCol%5B17%5D=fi&showCol%5B18%5D=he&showCol%5B19%5D=sh&showCol%5B20%5D=vo&showCol%5B21%5D=ts&showCol%5B22%5D=dr&showCol%5B23%5D=cu&showCol%5B24%5D=fr&showCol%5B25%5D=lo&showCol%5B26%5D=bl&showCol%5B27%5D=to&showCol%5B28%5D=ac&showCol%5B29%5D=sp&showCol%5B30%5D=ag&showCol%5B31%5D=re&showCol%5B32%5D=ba&showCol%5B33%5D=tp&showCol%5B34%5D=so&showCol%5B35%5D=ju&showCol%5B36%5D=st&showCol%5B37%5D=sr&showCol%5B38%5D=ln&showCol%5B39%5D=te&showCol%5B40%5D=ar&showCol%5B41%5D=in&showCol%5B42%5D=po&showCol%5B43%5D=vi&showCol%5B44%5D=pe&showCol%5B45%5D=cm&showCol%5B46%5D=td&showCol%5B47%5D=ma&showCol%5B48%5D=sa&showCol%5B49%5D=sl&showCol%5B50%5D=tg&showCol%5B51%5D=gd&showCol%5B52%5D=gh&showCol%5B53%5D=gk&showCol%5B54%5D=gp&showCol%5B55%5D=gr&showCol%5B56%5D=tt&showCol%5B57%5D=bs&showCol%5B58%5D=wk&showCol%5B59%5D=sk&showCol%5B60%5D=aw&showCol%5B61%5D=dw&showCol%5B62%5D=ir&showCol%5B63%5D=pac&showCol%5B64%5D=sho&showCol%5B65%5D=pas&showCol%5B66%5D=dri&showCol%5B67%5D=def&showCol%5B68%5D=phy&r=230012&offset=720)
Traceback (most recent call last):
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/utils/defer.py", line 132, in iter_errback
yield next(it)
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/utils/python.py", line 354, in next
return next(self.data)
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/utils/python.py", line 354, in next
return next(self.data)
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
for r in iterable:
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
for r in iterable:
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/spidermiddlewares/referer.py", line 342, in
Al momento ho risolto rimuovendo questa riga: "league_name": self.parse_team(team_url),
Però non ne capisco la causa dell'errore
Hello @trivius86
File "/Users/alessandroagostinelli/Documents/scraper-main/src/sofifa/spiders/sofifa.py", line 90, in parse_team
league_name = resp.css(".info a::text").get()[:-4]
TypeError: 'NoneType' object is not subscriptable
This suggests that the league name cannot be extracted like that. In this moment I cannot help you to debug the issue, but I can tell you that as I'm seeing there, MLS has a different name scheme: [United States] Major League Soccer
, instead of what the script assumes.
Perhaps you can extend the script so that MLS is scrapable too 😉
Hello @trivius86
File "/Users/alessandroagostinelli/Documents/scraper-main/src/sofifa/spiders/sofifa.py", line 90, in parse_team league_name = resp.css(".info a::text").get()[:-4] TypeError: 'NoneType' object is not subscriptable
This suggests that the league name cannot be extracted like that. In this moment I cannot help you to debug the issue, but I can tell you that as I'm seeing there, MLS has a different name scheme:
[United States] Major League Soccer
, instead of what the script assumes.Perhaps you can extend the script so that MLS is scrapable too 😉
Ok grazie mille per la disponibilità, infatti togliendo il nome della lega da scrappare sono riuscito a procedere (tanto non mi servivano i nomi lega) però così mi hai fatto capire dove potrebbe essere il problema e posso fare delle prove. Grazie mille!
anyway congratulations, your code is absolutely the best for having a clean, tidy and precise csv
Thank you 😄
@trivius86 could you please share how did you fix the "Overall" please? thank you
@trivius86 could you please share how did you fix the "Overall" please? thank you
This is the link of the topic with the solution
Lo script funziona alla grande, e risolto anche il valore Overal che mancava. Però ora non so come mai all'improvviso quando lancio lo Scrap mi scarica solo i primi 60 giocatori , in pratica solo la prima pagina. Vi allego il risultato che restituisce il terminal che per voi è sicuramente qualcosa di piu chiaro