Closed bede closed 8 months ago
You could possibly do a sha256 check on the existing file? That way you can be certain if it's what you expect or not? I have some code for doing this in tbpore you can use if you want to go that route?
Certainly an option – I guess I would either hardcode the checksums or put them into a manifest of some kind to check post-download. Using a newer database with an old version then could cause checksum mismatches, so I'm tempted to just ensure that the download is completed for now.
I've mitigated this in 0.1.0 by downloading indexes to a temporary file before moving (minimap2) or extracting (Bowtie2) into the destination XDG data directory. That way if the download is interrupted, the aligner won't try to use a corrupted ref.
If I implement checksum validation, I guess the obvious way avoiding hardcoding would be to put checksum files with the same filename prefix as the ref/index in object storage? Any thoughts @mbhall88?
Yeah I think the "standard" method is putting the checksum in the storage location (e.g.). I think, as you say, this avoids hardcoding the checksums.
Thanks @mbhall88, I've added checksum verification in https://github.com/bede/hostile/commit/1e81debf73c4279f9682de08c1edf0791b15d47f for release in v1.
Currently if a genome/index download is abandoned, Hostile may think it's present and correct leading to errors. Could download to a temp location and move into
$XDG_DATA_DIR
or download and rename etc