ashnilkumar / colearn

GNU General Public License v3.0
27 stars 15 forks source link

Question about the input data format #1

Open guo2004131 opened 4 years ago

guo2004131 commented 4 years ago

Hi Ashnil, Thank you for your impressive work. I want to run your code on our dataset. However, as I am not familiar with tf, I did not know how to generate the input and output/label in tf record format. If possible, would you be so kind and also share your data generation code? Thanks in advance.

DG

XYZach commented 4 years ago

Hi Ashnil, Thank you for your impressive work. I want to run your code on our dataset. However, as I am not familiar with tf, I did not know how to generate the input and output/label in tf record format. If possible, would you be so kind and also share your data generation code? Thanks in advance.

DG

Hi, I wonder if you have solved this problem?

MissZzhang commented 4 years ago

The .tf dataset I made with my own data was not correctly put in. May I share your code of making dataset?

ashnilkumar commented 3 years ago

Hi all, apologies for the delated response; 2020 was a hectic year.

The TFRecord creation depends on the format of your input images. The dataset I used was first converted from DICOM into a 3D TIFF stack (using a batch process in my lab). The batch process does a lot of the required preprocessing to align the PET and CT data. The TFRecords were written for each TIFF stack by iterating through the PET and CT slices simultaneously, and saving each 2D slice pair as a separate example.

The process you will need will vary based on what form your data is in, whether it is registered, the original scanning resolution, etc.

My suggestion is to consider adapting the process in Raghav Sharma's tutorial here. There is also example code on Raghav's GitHub.

BANGzys commented 2 years ago

Hi Ashnil, Thank you for your impressive work.I want to know how to convert from DICOM into a 3D TIFF stack , or may I share your code of making dataset?

BANGzys commented 2 years ago

The .tf dataset I made with my own data was not correctly put in. May I share your code of making dataset?

Hi, I wonder if you have solved this problem?

ashnilkumar commented 2 years ago

Hi there,

Firstly, you do not need the TIFF (it is an optional part ). For us, the conversion to a TIFF stack was done because of some other work in my lab. I cannot share that code because it is a big package that produces other data outputs - and the code isn't as clean because it does so much. I don't recommend doing exactly that unless you really need the TIFF stack for some other purpose. It may be simpler to go from DICOM to TFRecord (with an intermediate step in numpy).

Creation of the TFRecords requires knowledge of the data source. A key part is preprocessing to align the PET and CT volumes. The preprocessing will be different based on whether the images come from a hybrid scanner or from separate scanners. It will depend on the scanner spatial resolution, slice thickness, etc. That will depend on your own data source. My lab's code form making the TFRecord is based on our data source. It may not necessarily work unless your data source is exactly the same.

Hence, my suggestion is to write your own TFRecord creation based on your data source, while doing any preprocessing you need. The process is pretty straightforward:

  1. You can load DICOM into a numpy 3D array. You can do this with pydicom (example here, see the variable img3d).
  2. Then you can convert that numpy array into a TFRecord. There a tutorial here - see code blocks 4, 5, and 6. You can apply that by iterating over the slices.
BANGzys commented 2 years ago

Thank you for your reply !Your suggestion is important to me.So my next jobs :

1.Align the PET and CT volumes. 2.Load DICOM into a numpy 3D array. 3.convert that numpy array into a TFRecord.

Are these steps accurate?

BANGzys commented 2 years ago

Hi Ashnil, I have a new question that will the change of resolution caused by the change of image size affect the training effect,and how to resolve this problem.

ashnilkumar commented 2 years ago

Thank you for your reply !Your suggestion is important to me.So my next jobs :

1.Align the PET and CT volumes. 2.Load DICOM into a numpy 3D array. 3.convert that numpy array into a TFRecord.

Are these steps accurate?

Yes, this sounds correct.

ashnilkumar commented 2 years ago

I have a new question that will the change of resolution caused by the change of image size affect the training effect,and how to resolve this problem.

I am not quite sure I understand the issue. Could you perhaps open a new issue and explain the image size, and the problem you are having. Thanks.