ImagingDataCommons / IDC-Tutorials

Self-guided notebook tutorials to help get started with using IDC
BSD 3-Clause "New" or "Revised" License
27 stars 14 forks source link

Add notebook with conversion examples #67

Open erindiel opened 3 months ago

erindiel commented 3 months ago

This notebook summarizes available conversion tools that rely on Bio-Formats for reading and writing various file formats. Specifically, it includes sample commands for converting using bfconvert and bioformats2raw, as well as a description of scenarios where one tool might be preferred over the other.

Sample data comes primarily from IDC. There are examples of both reading and writing DICOM. Are there other preferred datasets or methods for getting this data than what is used here?

This notebook can be run in Google Colab or it can be run locally; however, commands like wget will not work on Windows, so some sections will not be testable locally by Windows users. Is this acceptable?

cc @melissalinkert @dclunie @fedorov

review-notebook-app[bot] commented 3 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

fedorov commented 1 month ago

@erindiel @melissalinkert I started my review, and did some minor improvements to simplify access to data from IDC. You can find my edits here: https://colab.research.google.com/drive/1gkJpKr1cL5R4uEQkQtFE0UPGxtiJHXk_?usp=sharing.

Overall, the structure looks great! I have few minor comments, but I first wanted to bring up the issue that I think is a major one. The cells corresponding to conversion from DICOM to alternative representation are extremely slow.

This one was 48 minutes for a single H&E slide on a default Google Colab CPU instance.

image

The next cell has been running around that same time and is still not finished.

Can you comment on why this is so slow and what can be done about this? Is https://github.com/ome/bioformats/pull/4190 going to remedy this?

fedorov commented 1 month ago

The following one took almost 2 hours!

image
melissalinkert commented 1 month ago

Thanks, @fedorov. We're looking into the performance issue, as that seems to be noticeably slower than what we saw when originally testing.

https://github.com/ome/bioformats/pull/4190 is expected not to affect conversion with bioformats2raw/raw2ometiff - that set of changes is around expanding access to "precompressed" tiles, which the bioformats2raw/raw2ometiff conversion workflow cannot currently make use of.

erindiel commented 3 weeks ago

Thanks again @fedorov for noting the conversion time issue. We confirmed that when testing the notebook locally, the conversion took <10 minutes, even when lowering the max worker count using --max-workers. We therefore assume the I/O speeds on Google Colab are slower, increasing the conversion time dramatically.

A couple of options to improve the situation:

DanielaSchacherer commented 2 weeks ago

Hi Erin, also for this notebook, Andrey asked me to have a look. I think it's a very useful notebook for everyone that might have questions about conversion tools (I looked at the version where @fedorov already made some edits). I can confirm the running times he experienced in Colab (even a little longer) and would also suggest to take a small slide for exemplary use as well as add in the text that this is not something supposed to be run in Colab for a whole dataset. Apart from that, I would not push to the repository including the output (except for the two images close to the end of the notebook).