bioinf-mcb / gisaid-scrapper

Scrapping tool for GISAID data regarding SARS-CoV-2
MIT License
41 stars 16 forks source link

Add bulk download option #14

Open monomeric opened 4 years ago

monomeric commented 4 years ago

Would it be possible to add an option to download the bulk file, as when clicking "Download" on the bottom right of the browser, as well as the Acknowledgement Table?

Otherwise: Thanks for this truly useful package!

monomeric commented 4 years ago

Fixed the above by modifying gisaid-scrapper.py. Perhaps this is of use for someone else as well.

Added a profile to the firefox session:

profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir","~")
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/csv,application/download,application/excel,application/msword,application/octet-stream,application/pdf,application/ris,application/vnd.ms-excel,application/x-excel,application/x-msexcel,application/x-zip,application/x-zip-compressed,application/zip,image/png,text/csv,text/html,text/plain")
self.driver = webdriver.Firefox(options=options,firefox_profile=profile)

Added simple functions for the bulk download:

def bulk_download(self):
    time.sleep(2)
    self._bulk_download_data()
    time.sleep(2)
    self._bulk_download_acknowledgements()
    time.sleep(30)
    self.driver.quit()

def _bulk_download_data(self):
    #self.driver.execute_script(
    #    "document.getElementById('sys_curtain').remove()")
    self.driver.find_elements_by_tag_name("button")[3].click()

def _bulk_download_acknowledgements(self):
    self.driver.execute_script(
        "document.getElementById('sys_curtain').remove()")
    self.driver.find_elements_by_link_text("here")[0].click()