Giglium / vinted_scraper

A very simple Python package that scrapes the Vinted website to retrieve information about its items.
MIT License
13 stars 2 forks source link

Improve perfomance with async requests #30

Open Giglium opened 8 months ago

Giglium commented 8 months ago

What is the current behavior? And why it does suit you?

For now, I'm using threads to simulate async requests but they don't perform well like real async requests, of course! For this reason, I was thinking of implementing async requests in the future and I was thinking of using httpx only because it offers both sync and async requests, so it will be easier for those who are new to asynchronous programming. On the other hand, aiohttp has better performance that is the thing that normally I like more.

Describe the solution you'd like

Based on the usage of the package by the community I will decide on which package to use. Feel free to comment!

Other information

Initial discussion #17

Ousret commented 8 months ago

How about trying Niquests instead?

Giglium commented 8 months ago

Very interesting Niquests. In particular, it is effortless to integrate and we can try it without changing anything. But I cannot find any comparison with the other modules. On the README they are saying:

Fastest(*) performance measured when leveraging a multiplexed connection with or without uses of any form of concurrency as of november 2023. The research compared httpx, requests, aiohttp against niquests.

But I cannot find this research, did you find it?

Ousret commented 8 months ago

Glad the first impression of Niquests was satisfying.

But I cannot find this research, did you find it?

You can find more details about this at https://blog.devgenius.io/10-reasons-you-should-quit-your-http-client-98fd4c94bef3 The research was conducted internally by Niquests to know whether or not this was worth it. More will come, especially in terms of performance.

You can also pick some examples at https://replit.com/@ahmedtahri4/Python#main.py

If you have any concerns about it, I can answer or even help.

Relax594 commented 5 months ago

@Giglium is there any progress or plans to implement this?

Giglium commented 5 months ago

Hi @Relax594, I'm sorry, but I'm swamped at the moment and haven't had a chance to tackle this task yet. I don't have immediate plans to start on it, but I'll keep you updated if anything changes.

For now, the best choice would be Niquests with multiplexing. It is effortless to integrate and we can try it without changing anything. My only concern is the conversion from the JSON response but I think it can be easily handled by the response hook (see: https://niquests.readthedocs.io/en/latest/user/advanced.html#event-hooks) but I don't have time to perform some tests.

Relax594 commented 5 months ago

Hi @Giglium , thanks for the quick update!

I wanted to add async support to this lib with aiohttp by myself but failed to do so as I lack experience in that regard. I've read that you used Threads to add some kind of async behaviour in the meantime. Can you maybe show an example of how you implemented it?

If you find the time to look more into this, I would highly appreciate it.

Thanks for this lib!

Giglium commented 5 months ago

It's a copy-paste from part of my code where I remove a lot, so treat it like a pseudo-code:

import threading
from vinted_scraper import VintedScraper

class VintedEvaluator(AbstractEvaluator):
    def __init__(self, topic: str):
        self.topic = topic
        self.vinted = VintedScraper("https://www.vinted.com")

    def perform(self):
         params = {"search_text": self.topic}
         items = self.vinted.search(params)
         # Do further processing or evaluation of the search results

def main():
    topics = [...]  # List of topics for which searches will be performed
    threads = []

    # Create and start a thread for each topic
    for topic in topics:
         evaluator = VintedEvaluator(topic)
         thread = threading.Thread(target=evaluator.perform)
         threads.append(thread)
         thread.start()

    # Wait for all threads to finish
    for thread in threads:
        thread.join()

if __name__ == "__main__":
    main()

With this code, you can perform multiple searches at the same time, you can do the same with the item function . Keep care, if you perform too many requests to Vinted you will receive a 429 status code, and you will have to wait a little bit.