Some Suggestions - Githubissues

wisecsj commented 7 years ago

Spider class add cookies field cause some websites need login
In parser.py file,there is await item.save(), a function used to store information mostly in local file(user can override the function). As far as i'm concerned, code like
```
async def save(self):
    with open('scrapinghub.txt', 'a+') as f:
        f.writelines(str(self.results) + '\n')
```


is blocking as local filesystem access is blocking.Therefore,the event loop(Thread) is blocking.
Especially when we select a MB  size file and want to store in local file, it would slow the whole application.

So, It's that possible use **aiofile**(File support for asyncio,https://github.com/Tinche/aiofiles) or use loop.run_in_executor makes save funciton run in another thread when the file is large?

elliotgao2 commented 7 years ago

Putting the cookies into headers is better.
I agree with you, loop.run_in_executor is better.

wisecsj commented 7 years ago

Get it... Don't need to add cookies field

georgedorn commented 7 years ago

An example of getting cookies from a login and setting them in the header would be helpful. Should I just use the requests library to do the login, then extract the appropriate cookie and set it accordingly?

elliotgao2 commented 7 years ago

@georgedorn Coping cookies from browser is the right way.

elliotgao2 / gain

Some Suggestions #20