banhao / scrape-youtube-channel-videos-url

This Python script is used to scrape all the video links from a youtube channel.
MIT License
52 stars 19 forks source link

the output duplicates scraped links #4

Open 111100001 opened 5 months ago

111100001 commented 5 months ago

for some reason it outputs each link twice:

this is the beginning of the terminal output:

` python3 scrape-youtube-channel-videos-url.py -b chrome -u "https://www.youtube.com/@supertfVODs/videos"

DevTools listening on ws://127.0.0.1:60760/devtools/browser/9bf8d5c4-cd08-4c46-9b80-5768854f047e [12212:12940:0603/160441.774:ERROR:device_event_log_impl.cc(195)] [16:04:41.773] USB: usb_service_win.cc:105 SetupDiGetDeviceProperty({{A45C254E-DF1C-4EFD-8020-67D146A850E0}, 6}) failed: Element not found. (0x490) Created TensorFlow Lite XNNPACK delegate for CPU. None None None https://www.youtube.com/watch?v=eRWeV-TQEkQ https://www.youtube.com/watch?v=eRWeV-TQEkQ None https://www.youtube.com/watch?v=XEQESkUchEs https://www.youtube.com/watch?v=XEQESkUchEs None https://www.youtube.com/watch?v=5tOLy8wK4fw https://www.youtube.com/watch?v=5tOLy8wK4fw None https://www.youtube.com/watch?v=RqF-abNqCEc https://www.youtube.com/watch?v=RqF-abNqCEc None https://www.youtube.com/watch?v=J-J7fO2r5gg`

banhao commented 5 months ago

After you generate the URL list file: on Windows OS, if you have notepad++, just go to "Edit" menu, "Line Operations" -> "Remove Duplicate Lines" on Linux OS, use command "cat url.list | sort -u > unique_url.list"