Booritas / slideio

BSD 3-Clause "New" or "Revised" License
49 stars 2 forks source link

With statement and leaking memory #1

Closed ethancohen123 closed 3 years ago

ethancohen123 commented 3 years ago

Hi, Is there a way to open with slideio a file (czi) using the with statement ? I get the AttributeError: enter error when I try to do it Also , it seems to be a leak of cpu memory when launching scene.read_block ; Indeed when running that several time, the memory keeps increasing and is not being freed during the process . Do you have a solution for that pls ? Thank you

Booritas commented 3 years ago

Thanks a lot for the reporting of a problem.

I just checked and confirm that "with" statement with open_slide function currently returns an error. I will fix it ASAP.

I will check the memory freeing as well and let you know about the results. Could you post a code snippet for the memory problem?

In a couple of weeks, I expect to publish a new version which should have fixes for the problems.

ethancohen123 commented 3 years ago

Hi, Thanks for the quick answer ! My code is the following: When I call this function, it works fine but my cpu memory keeps increasing even when I delete the object using del or gc.collect(). Thanks :) def operation(scene,i,j,size):

tmp0=scene.read_block(rect=(i,j,size,size), channel_indices=[0], slices=(0,3))
tmp1=scene.read_block(rect=(i,j,size,size), channel_indices=[1], slices=(0,3))
tmp2=scene.read_block(rect=(i,j,size,size), channel_indices=[2], slices=(0,3))

agreg = np.mean
mc0 = agreg(tmp0, axis=2).astype('uint8') 
mc1 = agreg(tmp1, axis=2).astype('uint8')
mc2 = agreg(tmp2, axis=2).astype('uint8')

out = np.stack((mc2.T,mc1.T,mc0.T)).T
return out
ethancohen123 commented 3 years ago

Hi, Any news concerning the issue ? If the new version will be published in the couple weeks, is there a way to install from the git repo in the mid time ? Really appreciate the help Thanks

Booritas commented 3 years ago

Hi, somehow, I cannot reproduce the memory issue. I created a function that performs multiple reads from a czi file. I printed the heap before the call, inside the functions (after reads) and after the function. Before reading

Partition of a set of 308502 objects. Total size = 40051526 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  87427  28 13182949  33  13182949  33 str
     1  71903  23  5301976  13  18484925  46 tuple
     2  20229   7  3584328   9  22069253  55 types.CodeType
     3  39864  13  3196664   8  25265917  63 bytes
     4  19364   6  2633504   7  27899421  70 function
     5   2427   1  2218144   6  30117565  75 type
     6   6094   2  1722448   4  31840013  79 dict (no owner)
     7   2427   1  1402112   4  33242125  83 dict of type
     8    691   0  1365688   3  34607813  86 dict of module
     9   4722   2   568896   1  35176709  88 list
<644 more rows. Type e.g. '_.more' to view.>

After reading, inside the function

Partition of a set of 308050 objects. Total size = 1120007911 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0    126   0 1080018796  96 1080018796  96 numpy.ndarray
     1  87429  28 13183216   1 1093202012  98 str
     2  71755  23  5291368   0 1098493380  98 tuple
     3  20227   7  3584882   0 1102078262  98 types.CodeType
     4  39860  13  3196468   0 1105274730  99 bytes
     5  19218   6  2613648   0 1107888378  99 function
     6   2427   1  2218144   0 1110106522  99 type
     7   6094   2  1722448   0 1111828970  99 dict (no owner)
     8   2427   1  1402112   0 1113231082  99 dict of type
     9    691   0  1365688   0 1114596770 100 dict of module
<644 more rows. Type e.g. '_.more' to view.>

After the function

Partition of a set of 308034 objects. Total size = 40005937 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  87427  28 13182949  33  13182949  33 str
     1  71755  23  5291368  13  18474317  46 tuple
     2  20227   7  3585399   9  22059716  55 types.CodeType
     3  39860  13  3196468   8  25256184  63 bytes
     4  19218   6  2613648   7  27869832  70 function
     5   2427   1  2218144   6  30087976  75 type
     6   6094   2  1722448   4  31810424  80 dict (no owner)
     7   2427   1  1402112   4  33212536  83 dict of type
     8    691   0  1365688   3  34578224  86 dict of module
     9   4716   2   560264   1  35138488  88 list
<642 more rows. Type e.g. '_.more' to view.>

As you can see, the slideio allocated multiple numpy array that were freed after the function. I will try to reproduce the problem with different scenarios.

Could you try to reproduce the issue with the new version? You can download the latest version from CI artifacts of my gitlab repository: https://gitlab.com/bioslide/slideio/-/pipelines . The latest artifacts are here:

https://gitlab.com/bioslide/slideio/-/jobs/939336853/artifacts/browse

Please, let me know if it works for you.

Note: in the new version shape of 3d volumes is changed. Now, an array of 30 slices of (1000x1000) will have shape (30, 1000, 1000).

ethancohen123 commented 3 years ago

Okay thank you To be more precise, my script is compose of two for loops that take value i and j and cut the image in pieces and then save it. A more complete script is here: Shall I do pip upgrade or pip install git to load the new version ? Thank you

def cut_goutte_epaisse(path,size=256,pad=100):

slide = slideio.open_slide(file_path=path, driver_id="CZI")
scene = slide.get_scene(0)
x_scene=scene.rect[2]
y_scene=scene.rect[3]
x_list,y_list=numbers_cut(pad,size,x_scene,y_scene)
#liste=[]
#pour chaque y on parcours une ligne puis on redescend
for j in x_list:
    print('j:',j)
    for i in y_list:
        #print('i:',i)
        try:

            out=operation(scene,i,j,size=size)
            #do something with out
            del out 
            gc.collect()

        except RuntimeError:
            print('beug at i=',i,'and j=',j)

    #gc.collect()

if name == 'main': path_dir='/projects/smala/rawdata/2020_11_10_allslides' files=load_set(path_dir)[0] #take the file we want print(files) new_dir='/projects/smala/data_cut_GE/'+files[-38:-4] create_dir(new_dir) cut_goutte_epaisse(files)

Booritas commented 3 years ago

Thanks for the script. I will try it. For the installation of the new version, download corresponded whl file and execute pip install path-to-the-file

ethancohen123 commented 3 years ago

Okay thank you Is the with statement has also changed and work ?( just tried with slideio.open_slide(file_path=path, driver_id="CZI") as sld: scene=sld.get_scene(0) but it doesnt work unless I am using it uncorrectly)

Booritas commented 3 years ago

Sorry, "with" statement issue is not fixed yet. I'll let you know when it is ready

ethancohen123 commented 3 years ago

Okay thanks In case the script I sent wasnt clear here is the entire .py file : https://github.com/ethancohen123/j/blob/main/cut_images_GE.py Basically what I am trying to do is to cut czi images ( very large approximatively size 50000 50000) in small images of size 256 256 by going iteratively through lines and columns. When using htop (working on a distant server) to look how the memory increase , it increases quite fast and get never freed. What seems to be the problem is the function operation ( Indeed when I remove this line and run the script everything goes just fine).

ethancohen123 commented 3 years ago

PS: Unfortunatly, installing the new version does not work for me. I still have memory issues

Booritas commented 3 years ago

Still no luck with reproducing of the memory leak. The memory remains stable with all my attempts (tried with your code from the github). Could you print the process memory on your machine after each call to the "operation" function?

                process = psutil.Process(os.getpid())
                print(process.memory_info().rss//1024//1024)  # in MBytes 

To narrow the process, I would need as well the following information:

The problem can be related even to the specific image. Depending on the image format different functions are called (for ex. de-compressors). So it would be helpful to have the following information:

This information can be easily extracted with the slideio scene properties or free Zeiss ZEN-lite software (https://www.zeiss.com/microscopy/int/products/microscope-software/zen-lite.html).

Is there any possibility to share such an image? I would handle it with discretion, use only for testing purposes and delete after the fixing of the problem.

ethancohen123 commented 3 years ago

Hi, Heres the screenshot for a few iterations. My operating system is NAME="Ubuntu" VERSION="16.04.4 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.4 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial and I am using python 3.8.5. I have also printed the informations youve asked (for one image and one scene, knowing that there is two scene per image). Also I am gonna put a Gdrive link for one image (can't put more since each image is ~6GB) as soon as it is ready on my PC. capture4

Capture Capture1 Capture2

Booritas commented 3 years ago

Thanks, with the image it should be easy to reproduce the issue. I will make it as the high priority. Please do not post the Gdrive link here. Better send it to my email address: booritas@gmail.com

ethancohen123 commented 3 years ago

Thank you very much I will sent the mail asap.

ethancohen123 commented 3 years ago

It's done

Booritas commented 3 years ago

Finally, I can reproduce it with your image. I'm investigating the problem. It may take some time. I will let you know as soon as I have something. Thanks for your help!

Booritas commented 3 years ago

I found the problem and fixed the memory leak. The memory consumption of your script is stable on my computer now. I need some time for the updating of the distribution files. I expect them to be ready tomorrow, the latest day after tomorrow.

Booritas commented 3 years ago

I'm still working on the distribution but if you want to try a version with memory fix, you can download linux whl files from here:

https://gitlab.com/bioslide/slideio/-/jobs/940624320/artifacts/browse/linux-py/

Please let me know if it works for you.

ethancohen123 commented 3 years ago

It works now, thank you very much for the precious help !

Booritas commented 3 years ago

"with" statement is implemented. Wheel files can be downloaded from here: https://gitlab.com/bioslide/slideio/-/jobs/947056624/artifacts/browse/python-wheels/