SilenceEagle / paper_downloader

Download papers and supplemental materials from open-access paper website, such as AAAI, AISTATS, COLT, CORL, CVPR, ECCV, ICCV, ICLR, ICML, IJCAI, JMLR, NIPS, RSS, WACV.
MIT License
228 stars 31 forks source link

what is the oral.html, acting as the html_path #17

Closed rrryan2016 closed 3 years ago

rrryan2016 commented 3 years ago

Thanks for your kind sharing.

I intend to run the code to download paper of ICLR 2021, but I get confused about the html_path in https://github.com/SilenceEagle/paper_downloader/blob/c742c1bd51f059394cd0b294949274b30a54e87b/code/paper_download_ICLR_IDM.py#L644.

Any suggestion on it? Thanks in advance.

SilenceEagle commented 3 years ago

@rrryan2016 Hi, the html_path is the path of a 'html' file, I wrote this function in this way because the link speed to OPENREVIEW in my home is very slow. So I first get the website's html code and then use it to download ICLR papers. The attached files "oral.txt" and "spotlight.txt" are ICLR 2021 web html code, you can just change the file extension of them from "txt" to "html", and then set the html_path to the html file full pathname, such as "F:\oral.html". In addition, this function could download all the "oral", "spotlight" and 'poster' papers listed in html file, you can use the "oral.html" to download all ICLR2021 oral and poster papers, and use "spotlight.html" to download ICLR2021 spotlight papers oral.txt spotlight.txt

rrryan2016 commented 3 years ago

@rrryan2016 Hi, the _htmlpath is the path of a 'html' file, I wrote this function in this way because the link speed to OPENREVIEW in my home is very slow. So I first get the website's html code and then use it to download ICLR papers. The attached files "oral.txt" and "spotlight.txt" are ICLR 2021 web html code, you can just change the file extension of them from "txt" to "html", and then set the _htmlpath to the html file full pathname, such as "F:\oral.html". In addition, this function could download all the "oral", "spotlight" and 'poster' papers listed in html file, you can use the "oral.html" to download all ICLR2021 oral and poster papers, and use "spotlight.html" to download ICLR2021 spotlight papers oral.txt spotlight.txt

Thanks for the reply, and it is really helpful.

But I later get stuck by another error 😄

[WinError 2] 系统找不到指定的文件。 for every paper to download.

Any suggestion for it, sorry for disturbance.

SilenceEagle commented 3 years ago

@rrryan2016 Hi, the _htmlpath is the path of a 'html' file, I wrote this function in this way because the link speed to OPENREVIEW in my home is very slow. So I first get the website's html code and then use it to download ICLR papers. The attached files "oral.txt" and "spotlight.txt" are ICLR 2021 web html code, you can just change the file extension of them from "txt" to "html", and then set the _htmlpath to the html file full pathname, such as "F:\oral.html". In addition, this function could download all the "oral", "spotlight" and 'poster' papers listed in html file, you can use the "oral.html" to download all ICLR2021 oral and poster papers, and use "spotlight.html" to download ICLR2021 spotlight papers oral.txt spotlight.txt

Thanks for the reply, and it is really helpful.

But I later get stuck by another error 😄

[WinError 2] 系统找不到指定的文件。 for every paper to download.

Any suggestion for it, sorry for disturbance.

Have you installed the "Internet Download Manager" software, this function will call IDM to download papers in default setting. After installing IDM, you can change the setting-->download menu as follow to disable the popup of downloading pages for convenience. image image

If you change the IDM's install path, you should also change this code https://github.com/SilenceEagle/paper_downloader/blob/be6cb67aaff607c4adfaeb298498c926bfb451a7/lib/IDM.py#L13