Closed manthey closed 3 weeks ago
As an added option, it might be nice to have a "quietly skip files that aren't in a format we know" flag, since in my large sample files collection (around 60,000+ tiff and dicom files), I have a mix that includes formats that we can't redact (e.g., hdmi, ImageJ, etc). We skip formats we don't recognize quietly (e.g., nd2), and it might be nice in quiet mode to skip any that raise UnsupportedFileTypeError.
Do you have an example of a format, or a file that are disruptive to a run of the program? I thought we were already skipping files that aren't supported.
I agree this is a good idea. I can test as well on a similarly large collection.
On Mon, Sep 16, 2024 at 4:36 PM Michael Nagler @.***> wrote:
As an added option, it might be nice to have a "quietly skip files that aren't in a format we know" flag, since in my large sample files collection (around 60,000+ tiff and dicom files), I have a mix that includes formats that we can't redact (e.g., hdmi, ImageJ, etc). We skip formats we don't recognize quietly (e.g., nd2), and it might be nice in quiet mode to skip any that raise UnsupportedFileTypeError.
Do you have an example of a format, or a file that are disruptive to a run of the program? I thought we were already skipping files that aren't supported https://github.com/DigitalSlideArchive/ImageDePHI/blob/941488454f79025fd71307a2820704c49b142684/imagedephi/redact/redact.py#L134-L144 .
— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/ImageDePHI/issues/241#issuecomment-2353969656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTXOJNJTURQSNLYBYHDZW46MNAVCNFSM6AAAAABL4QO7HOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJTHE3DSNRVGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- David A Gutman, M.D. Ph.D. Associate Professor of Pathology Emory University School of Medicine
Do you have an example of a format, or a file that are disruptive to a run of the program? I thought we were already skipping files that aren't supported.
Note to self: discussed in the meeting today, this can be recreated by adding the .svs extension to a non-Aperio tiff file
Yeah so I had some broken svs files where it actually was technically written by an aperio scanner but something seemed to screw up and so the file was incomplete. Many of these had a second copy of the same slide so I can't tell If the technician had just rescanned it manually or the software just realized there was an error and rewrote the file.
On Tue, Sep 17, 2024, 10:14 AM Michael Nagler @.***> wrote:
Do you have an example of a format, or a file that are disruptive to a run of the program? I thought we were already skipping files that aren't supported https://github.com/DigitalSlideArchive/ImageDePHI/blob/941488454f79025fd71307a2820704c49b142684/imagedephi/redact/redact.py#L134-L144 .
Note to self: discussed in the meeting today, this can be recreated by adding the .svs extension to a non-Aperio tiff file
— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/ImageDePHI/issues/241#issuecomment-2355963431, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTVVZDEWKSMRY2F4STLZXA2KTAVCNFSM6AAAAABL4QO7HOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJVHE3DGNBTGE . You are receiving this because you commented.Message ID: @.***>
In order to test on a large collection of images, I think it would be nice to be able to do something like
dist/imagedephi plan -r -q /top/level/folder
and have a tqdm progress bar. With a single value of the quiet flag (or maybe at the default level), I'd ideally like to see just the files that can't be redacted with the current rule set. The default verbosity is too noisy -- if something failed to redact it has scrolled away. The -q verbosity doesn't show the path of the file that fails.As an added option, it might be nice to have a "quietly skip files that aren't in a format we know" flag, since in my large sample files collection (around 60,000+ tiff and dicom files), I have a mix that includes formats that we can't redact (e.g., hdmi, ImageJ, etc). We skip formats we don't recognize quietly (e.g., nd2), and it might be nice in quiet mode to skip any that raise
UnsupportedFileTypeError
.Regarding progress, when we do redactions, we convert the
iter_image_files
generator to a list, but on the plan we don't. The list lets tqdm print the percent done, but on a large recursive file set, the check of which files we can possibly read takes a long time. Should we have a progress bar on generating that initial list? That is, a progress bar with unknown total quantity while collection files? Otherwise the program just sits for a very long time.