Xerbo / furaffinity-dl

FurAffinity Downloader, now with 100% more Python
BSD 3-Clause "New" or "Revised" License
91 stars 17 forks source link

Fix duplicate requests #4

Closed asl97 closed 7 years ago

asl97 commented 7 years ago

uniq prevent downloading of the same link/page.


Duplicate test:

wget --user-agent="Mozilla/5.0 furaffinity-dl (https://github.com/Shnatsel/furaffinity-dl)" -O test http://www.furaffinity.net/gallery/kodardragon

Current:

>>> grep '<a href="/view/' "test" | grep -E --only-matching '/view/[[:digit:]]+/'
/view/22600924/
/view/22600924/
/view/22574940/
/view/22574940/
/view/22151236/
/view/22151236/
/view/22095888/
/view/22095888/
/view/21689965/
/view/21689965/
/view/20568393/
/view/20568393/
/view/20136394/
/view/20136394/
/view/19340999/
/view/19340999/
/view/19286533/
/view/19286533/
/view/19096066/
/view/19096066/
/view/18944301/
/view/18944301/
/view/18938416/
/view/18938416/
/view/18925533/
/view/18925533/
/view/18658316/
/view/18658316/
/view/18649506/
/view/18649506/
/view/18612461/
/view/18612461/
/view/18445520/
/view/18445520/
/view/18230007/
/view/18230007/
/view/18213557/
/view/18213557/
/view/18158944/
/view/18158944/
/view/18091650/
/view/18091650/
/view/18084564/
/view/18084564/
/view/17735190/
/view/17735190/
/view/17622519/
/view/17622519/
/view/17593385/
/view/17593385/
/view/17590449/
/view/17590449/
/view/17471498/
/view/17471498/
/view/17450332/
/view/17450332/
/view/17253741/
/view/17253741/
/view/16837899/
/view/16837899/
/view/16738146/
/view/16738146/
/view/16712199/
/view/16712199/
/view/16654774/
/view/16654774/
/view/16584270/
/view/16584270/
/view/16564784/
/view/16564784/
/view/16556555/
/view/16556555/
/view/16550788/
/view/16550788/
/view/16550356/
/view/16550356/
/view/16469642/
/view/16469642/
/view/16460997/
/view/16460997/
/view/16458439/
/view/16458439/
/view/16457738/
/view/16457738/
/view/16457713/
/view/16457713/
/view/16439519/
/view/16439519/
/view/16439376/
/view/16439376/
/view/16431418/
/view/16431418/
/view/16423216/
/view/16423216/
/view/16422537/
/view/16422537/

with uniq:

>>> grep '<a href="/view/' "test" | grep -E --only-matching '/view/[[:digit:]]+/' | uniq
/view/22600924/
/view/22574940/
/view/22151236/
/view/22095888/
/view/21689965/
/view/20568393/
/view/20136394/
/view/19340999/
/view/19286533/
/view/19096066/
/view/18944301/
/view/18938416/
/view/18925533/
/view/18658316/
/view/18649506/
/view/18612461/
/view/18445520/
/view/18230007/
/view/18213557/
/view/18158944/
/view/18091650/
/view/18084564/
/view/17735190/
/view/17622519/
/view/17593385/
/view/17590449/
/view/17471498/
/view/17450332/
/view/17253741/
/view/16837899/
/view/16738146/
/view/16712199/
/view/16654774/
/view/16584270/
/view/16564784/
/view/16556555/
/view/16550788/
/view/16550356/
/view/16469642/
/view/16460997/
/view/16458439/
/view/16457738/
/view/16457713/
/view/16439519/
/view/16439376/
/view/16431418/
/view/16423216/
/view/16422537/
Shnatsel commented 7 years ago

AFAIK you never have to chain sort -u | uniq because sort -u alone already deduplicates entries.

Downloading from newest to the oldest is a feature, not a bug. FurAffinity displays images in that order so the least surprising behaviour is downloading them in the display order.

Thus I would simply pipe the results through uniq and be done with it; that should solve the duplicate requests while preserving downloading in the display order.

asl97 commented 7 years ago

Removed sort -u and rebased.

Honestly though, downloading from oldest to newest is more useful when downloading through a pool/group.

Shnatsel commented 7 years ago

Newest to oldest, on the other hand, lets you update your local collection with latest additions without walking through the entire list.

Thanks for your contribution!