Closed trvrb closed 4 years ago
Hi @trvrb We understand the source of GISAID's concern regarding scraping data from their website. However, we don't get how could this tool have any impact on the performance of their servers. The data is being downloaded via Selenium, which simulates a user-like interaction with the site. In consequence, the download of the data is done over a span of 2 hours (13 samples a minute), making it hardly noticeable by any commonly used metric. Furthermore, GISAID's website allows only a collective download of the data — even if a user needs only two newest samples, he still has to redownload the whole database. The scrapper allows him to update his data with only the missing samples, cutting down the number of downloaded FASTA records from 1500 to 2.
This tool was created to simplify access to the critical data regarding the outbreak for the people that already have access to the GISAID database. The terms of use do not forbid using automated tools, and considering points above we don't believe it can procure any inconvenience to the GISAID's infrastructure. We are happy to rediscuss this if we get a direct removal request from GISAID.
Hello,
I'm relaying a request from GISAID to remove this repository. The use of these scrapers are negatively impacting the ability for GISAID to host this critical scientific data.
Thank you.