bioinf-mcb / gisaid-scrapper

Scrapping tool for GISAID data regarding SARS-CoV-2
MIT License
41 stars 16 forks source link

Request to remove repository #15

Closed trvrb closed 4 years ago

trvrb commented 4 years ago

Hello,

I'm relaying a request from GISAID to remove this repository. The use of these scrapers are negatively impacting the ability for GISAID to host this critical scientific data.

Thank you.

wwydmanski commented 4 years ago

Hi @trvrb We understand the source of GISAID's concern regarding scraping data from their website. However, we don't get how could this tool have any impact on the performance of their servers. The data is being downloaded via Selenium, which simulates a user-like interaction with the site. In consequence, the download of the data is done over a span of 2 hours (13 samples a minute), making it hardly noticeable by any commonly used metric. Furthermore, GISAID's website allows only a collective download of the data — even if a user needs only two newest samples, he still has to redownload the whole database. The scrapper allows him to update his data with only the missing samples, cutting down the number of downloaded FASTA records from 1500 to 2.

This tool was created to simplify access to the critical data regarding the outbreak for the people that already have access to the GISAID database. The terms of use do not forbid using automated tools, and considering points above we don't believe it can procure any inconvenience to the GISAID's infrastructure. We are happy to rediscuss this if we get a direct removal request from GISAID.