datawhores / OF-Scraper

A completely revamped and redesigned fork, reimagined from scratch based on the original onlyfans-scraper
MIT License
701 stars 59 forks source link

Missmatch beween number of found media and downloaded media #463

Closed lenamicha closed 2 months ago

lenamicha commented 2 months ago

Even though the script detect all the media available, they were not all downloaded. You can see this missmatch comparing the number of photos/videos found and the number of photos/videos that ended up being downloaded. 1 2

I've only been able to reproduce this bug when scrapping creators with a huge amount of content, in this case over 10k. Appending --force-all has not solved the issue.

As can be seen in the screenshot the scrip version is 3.12.4, running on linux.

datawhores commented 2 months ago

If you provide a full log we can see how the numbers match up, and where things were being removed

lenamicha commented 2 months ago

log.zip Thanks, unfortunately the log was too big (almost 150k lines in debug mode), so I had to zip it.

datawhores commented 2 months ago

well the initial timeline media count with locked was

10424 but that was reduced to 10380, once locked posts were removed Later the number gets reduced again to around 7000 once duplicate posts are removed

 2024-09-09 19:42:50:[timeline.process_tasks_batch:103]  Timeline Final count:  10424
 2024-09-09 19:42:51:[post.process_timeline_posts:206]  Timeline media count with locked 10380
 2024-09-09 19:42:51:[post.process_timeline_posts:207]  Timeline media count without locked 10380
 2024-09-09 19:4
..............
 2024-09-09 19:42:52:[main.filterMediaFinalDownload:62]  filter 3-> viewable media filter count: 11198
 2024-09-09 19:42:52:[main.filterMediaFinalDownload:67]  filter 4->  media dupe media_id filter count: 7627
 2024-09-09 19:

I wouldn't recommend it, because why would you want the same file twice

but you can change custom_value in the config to

custom_value:{ALLOW_DUPE_MEDIA:True}

Your filename should be unique enough to work

{date}_{post_id}_{media_id}_{filename}.{ext}
lenamicha commented 2 months ago

Thank you very much