Giglium / vinted_scraper

A very simple Python package that scrapes the Vinted website to retrieve information about its items.
MIT License
19 stars 5 forks source link

Modify the number of items returned #21

Open PeDiot opened 1 year ago

PeDiot commented 1 year ago

I'm just starting to use your library, which seems awesome btw. I wanted to know how to change the number of items returned by the search method of VintedScraper ? I've tried to add a page-size param in the search params, but it did not work. Thanks for ur help !

Giglium commented 1 year ago

Hi, thanks for the compliment. Sorry i close the issue for a miss-click. I din't find right now a params that control the number of item returned. For now I loop throw the pages since the search API is paginated.

For example:

import vinted_scraper.VintedScraper

def main():
    scraper = VintedScraper("https://www.vinted.com")
    params = {
        "search_text": "board games"
        # Add other query parameters like the pagination and so on
    }
    for i in range(0, 10):
                params["page"] = i
                items = scraper.search(params)

if __name__ == "__main__":
    main()
PeDiot commented 1 year ago

thank's for the tip !

lo1gr commented 2 months ago

Strangely - getting the same issue. There is a mismatch in the number of items i am able to extract & the number of items that appear using the same request on the website. Have you experienced this? Adding pagination did not help. My code: `import json from vinted_scraper import VintedScraper

Helper function to convert objects to dictionaries

def to_serializable(obj): if isinstance(obj, list): return [to_serializable(i) for i in obj] elif hasattr(obj, "dict"): return {key: to_serializable(value) for key, value in obj.dict.items()} else: return obj

def main():

Initialize the scraper with the base URL

scraper = VintedScraper("https://www.vinted.fr")

# Define search parameters with page 1 as the starting point
params = {
    "search_text": "padel racket",
    "brand_ids": [48801, 372642, 14, 15453, 689757],
    "price_from":20,
    "currency":"EUR",
    "order": "newest_first",
    "page": 1  # Start with the first page
}

all_items = []

while True:
    # Perform the search
    items = scraper.search(params)

    if not items:
        print(f"No items found on page {params['page']}. Ending search.")
        break  # Exit loop if no more items are found

    print(f"Page {params['page']}: Found {len(items)} items.")

    all_items.extend(items)  # Add found items to the all_items list
    params["page"] += 1  # Move to the next page

if all_items:
    # Convert the items to a JSON-serializable format
    serializable_items = [to_serializable(item) for item in all_items]

    # Save the results to a JSON file
    with open('vinted_search_results.json', 'w', encoding='utf-8') as f:
        json.dump(serializable_items, f, ensure_ascii=False, indent=4)

    print(f"Saved {len(all_items)} items to 'vinted_search_results.json'")
else:
    print("No items were found during the search.")

if name == "main": main() `

Giglium commented 2 months ago

Checking the browser network tab when I change a page in the site I see that Vinter uses the param per_page=96. I don't have time right now, but we can investigate if this parameter controls the number of items the API returns. Is this one the reason why you see two different results?

@lo1gr I suggest using VintedWrapper instead of VintedScraper, the VintedWrapper object will return the JSON of the API without converting it to the model, so you don't have to re-transform it again.