driscoll42 / ebayMarketAnalyzer

Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel
195 stars 26 forks source link

Bug on UK version #59

Closed Maximo1491 closed 3 years ago

Maximo1491 commented 3 years ago

Hi there, love the project!

main.py::371 item_date = datetime.strptime(orig_item_datetime, '%d %b %Y')

for UK this needs to be '%b %d %Y'

didn't want to make a whole fork/branch just for this!

edit Looks like there was a fair few changes to this last night actually so maybe it will be worth me branching and having a look at UK side

driscoll42 commented 3 years ago

Thanks for the updates! I have neglected the UK branch as I've been focusing on a number of other fixes. I'll run some UK tests on my end today to see if I missed any other edge cases.

driscoll42 commented 3 years ago

There was another bug I found when testing the UK branch that I pushed and I updated the run_uk.py to be up to the new standards, though now switching between UK/USA should be far easier when running the code.

Again, really appreciate letting me know about the bugs, if you see anymore, or have feature requests, don't hesitate to make another issue!

Maximo1491 commented 3 years ago

Thanks a lot for pushing out more fixes! I'm just going through how it all works :D Impressive stuff! What settings give the best performance? I'm only really looking for the average price of sold items for each query

driscoll42 commented 3 years ago

For best performance, set feedback=False and quantity_hist=False and you can set sleep_len to something low like 0.5 or 1. It's mostly about how much detail you need:

quantity_hist=False, you'll only get the latest sale if a listing is for multiple items. For example this listing (https://www.ebay.co.uk/itm/MSI-NVIDIA-GeForce-RTX-3060-12GB-GAMING-X-TRIO-Graphics-Card/124623510690?hash=item1d0423dca2:g:rpUAAOSwLEJgSNyj) has 4 sold, but with quantity_hist = False you'll only see the latest one as the code doesn't go to the sale history

feedback=False, you'll lose the seller, seller_feedback, city, state, country, if it's a store. I don't think you care as much about that.

Unfortunately you can't set quantity_hist=True and feedback=False as to find the quantity_hist link you have to go through the item_page link which is what feedback uses.

However only a fraction of sales are multilistings, usually under 5-10%, almost never enough to dramatically affect the average price. You're likely fine with setting them false.

sleep_len is a sleep timer before every call to eBay. If it's too low (like 0-0.3) eventually eBay will just terminate your connection. However if you make it in the 0.5-3 range eBay won't terminate your connection but start throwing up CAPTCHAs, but only on the multilisting sale history page. If you set quantity_hist=False, there's no reason not to make sleep_len very low. Those changes will all make the process dramatically faster.

Of course also making your query as specific as possible, using the query_exceptions to specify "-" fields (-pics, -photos, etc...) and if you can set sacat to only search a particular category on eBay, plus the price ranges all make it faster.