Open kryptc opened 6 years ago
Hi @kryptc , I will try to update the readme this evening.
Cheers
I just cloned the repo again and installed from the beginning.
I followed this steps to make it work:
First of all I installed the requirements for the repo. I had some problems with it, but I think is only related to my machine. At this point I have a virtualenv with Django and ScrapyD installed.
One you have all installed, you will need to setup the database for Django. You can do this task executing this in the terminal:
python manage.py migrate
In order to access the data of the databases you will also need to create a superuser. You can do it typing this in the terminal
python manage.py createsuperuser
It is time to start Django and ScrapyD.
To run Django:
python manage.py runserver
To run ScrapyD:
cd scrapy_app
scrapyd
At this point you will have both services running.
The terminal of the left is running django, the one of the right runs ScrapyD. By default django runs in http://127.0.0.1:8000/admin/ and Scrapy runs in http://127.0.0.1:6800/ .
At this point you will have the ScrapyD admin ready to work.
curl http://localhost:6800/schedule.json -d project=default -d spider=toscrape-css
Once the spiders are execute, the data will be saved in the django models. Remeber that you can see the data using http://127.0.0.1:8000/admin/ and the superuser you created before. It will look something like this:
At this phase you are ready to go! I really hope this helps you a little. If you hace more more questions just ask them in this ticket
Cheers
Thanks very much! I figured out there was something wrong with my sqlite. It works now.
Thanks for the boilerplate. Was helpful in figuring out the two framework interactions.
Hi,
First of all, thanks for your share and work.
I keep on getting an error message on the last curl step. Do you have any clue what can be wrong? I followed all the steps as described, but can't start the spider. Any help is appreciated.
Just to be sure, both instances are running and I can access them on port 8000 and 6800.
Feedback from scrapyd
Edit: Well, let me correct this. The spider is running and quotes are being stored in het database. It is just not possible to view the 'jobs' page in scrapyd:
Getting following error when I run the curl command:
2020-05-17T09:59:13+0000 [_GenericHTTPChannelProtocol,11,127.0.0.1] Unhandled Error
Traceback (most recent call last):
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/http.py", line 2284, in allContentReceived
req.requestReceived(command, path, version)
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/http.py", line 946, in requestReceived
self.process()
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/server.py", line 235, in process
self.render(resrc)
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/server.py", line 302, in render
body = resrc.render(self)
---
2020-05-17T09:59:13+0000 [twisted.web.server.Request#critical]
Traceback (most recent call last):
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/http.py", line 1755, in dataReceived
finishCallback(data[contentLength:])
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/http.py", line 2171, in _finishRequestBody
self.allContentReceived()
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/http.py", line 2284, in allContentReceived
req.requestReceived(command, path, version)
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/http.py", line 946, in requestReceived
self.process()
---
For what I can see the error is:
File "/home/ubuntu/test/venv/lib/python3.6/site-packages/twisted/web/http_headers.py", line 40, in _sanitizeLinearWhitespace return b' '.join(headerComponent.splitlines()) builtins.AttributeError: 'int' object has no attribute 'splitlines'
I have no clue of what is going on to be honest. All the dependencies are pinned so it should not be a package version problem.
This repository is based on https://medium.com/@ali_oguzhan/how-to-use-scrapy-with-django-application-c16fabd0e62e article.
I really recommend you asking there if someone else had that problem. I am sorry but I am not able to reproduce the error.
I ran your code according to the instructions in the readme. I can view the responses in the logs directory in scrapy_app but when I open up my sqlite3 prompt, there are no tables or databases that have been created in spite of there being an sqlite database configured. How do you access and manipulate the scraped data? Currently, I can't verify if data has been added to my database or not.