set of helper tools for the assembly of the different elements in the RELECOV platform (Spanish Network for genomic surveillance of SARS-Cov-2) as data download, processing, validation and upload to public databases, as well as analysis runs and database storage.
GNU General Public License v3.0
5
stars
21
forks
source link
Include a test for file integrity somewhere in the workflow #276
Even though the md5 is checked when a file is downloaded, it could be corrupted from the beggining. In those cases, since the md5 is the same before and after transfer, it is not recognized as corrupted.
This test might be better implemented for ".gz" files in download module
Pseudocode:
import gzip
chunksize=10000000 #(10mb)
with gzip.open(file_to_test, 'rb') as f:
while f.read(chunksize):
pass
return True
This will raise an exception if its not gzipped or corrupted
Even though the md5 is checked when a file is downloaded, it could be corrupted from the beggining. In those cases, since the md5 is the same before and after transfer, it is not recognized as corrupted.
This test might be better implemented for ".gz" files in download module Pseudocode:
This will raise an exception if its not gzipped or corrupted