Open amitsaha opened 1 month ago
I downloaded the above files once again from gcs and ran a diff against my original copy, and the images are looking good, no diff. So I guess, we just need to update the SHA256SUMS.txt
file.
My script used for the above:
import os
import subprocess
# The sha256 checksum of these images don't match with the ones reported in SHA256SUMS.txt
# so we download them locally and do a diff to ensure they are the same images
# which likely means the sha256 sums need updating
image_paths = [
"files/p10/p10375986/s59475126/44685902-a2ada121-02735bc5-bf1bf167-adfd2ae5.jpg",
"files/p11/p11131026/s59741822/08c22db9-5bef7d06-d904ec15-7bbfe57f-416dbdc1.jpg",
"files/p11/p11607063/s58298420/235c7af4-ef2ba0dc-7dc251ea-a2571f33-d37c8185.jpg",
"files/p11/p11785297/s58022353/3b64bf5a-021ff5ae-137c22d1-5529364f-1415c640.jpg",
"files/p11/p11920643/s55676416/4d70ff33-43ad77af-22ff047c-19f6ceb1-aae49eea.jpg",
"files/p13/p13283178/s55081421/026de108-3310a177-7c01791c-7eb32cff-b076122f.jpg",
"files/p13/p13628037/s54872639/f845ad66-716c76dd-da718912-8b0ff596-b30d25cb.jpg",
"files/p13/p13694166/s55805720/df57d48e-566984d2-fbe39e6e-0c68fc55-380f1217.jpg",
"files/p14/p14656449/s56499991/67a4e5cd-50d441d3-42294f94-363ac071-17cfc342.jpg",
"files/p14/p14690121/s50057475/34ad06d4-475863f1-f3712cec-783c3b99-308cf886.jpg",
"files/p17/p17405329/s55291678/283084bb-0f4994a7-d7622b32-d7f18f75-d8dde41b.jpg",
"files/p17/p17490145/s55463370/803fcbd8-2e38a5c7-cca96a50-ce5660cb-83ecc3a1.jpg",
"files/p18/p18459824/s52186356/2eb68b2f-0742cb3d-b8c9db5b-9c9d74f9-69e31cc1.jpg",
"files/p18/p18690742/s56844948/f4f63777-6a8a6b60-d6cb0718-9256537a-2ca41831.jpg"
]
for image in image_paths:
# download to a temporary directory
subprocess.check_output([
"gcloud", "storage", "--billing-project", "<project-name>", "cp",
f"gs://mimic-cxr-jpg-2.1.0.physionet.org/{image}", f"tmp-check-diff/{os.path.basename(image)}"
])
# check the downloaded version against the one stored locally already
subprocess.check_output([
"diff",
f"tmp-check-diff/{os.path.basename(image)}", f"{image}"
])
It's odd because the SHA256SUMs are calculated automatically by PhysioNet when publishing the files, so I'm not sure why they would be wrong. Could be because we had some custom workarounds for MIMIC-CXR. I will raise with some of the PhysioNet team, thanks!
OK, I think something went wrong with our GCP upload because they were simply different on that bucket. Can you redownload them from the GCP bucket and check again? It should be fixed now.
Prerequisites
Description
I downloaded the MIMIC-CXR-JPG dataset from google cloud storage).
When I go to verify the sha256 sum, I find the following mismatches:
I redownloaded the above files, and still the same result.
For eg, if i take the first one:
Any ideas on how to further verify what may be causing this?