aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

Jupyter notebook crashes unexpectedly during scenic+ #107

Closed benduc closed 1 year ago

benduc commented 1 year ago

Dear developers of SCENIC+, I recently tried to reproduce your PBMCs tutorial. Except for some minor bugs that I could solve, everything went fine until I ran scenicplus. The process starts running fine (with 200GB RAM allocated to jupyter with the config file, and after some point, the jupyter console either freezes completely or exits unexpectedly). I still have a scenicplus object created, but I don't know if it is complete. Looking at the messages coming from Jupyter, I see:

[I 15:29:42.653 NotebookApp] Saving file at /sample_scenicplus.ipynb [E 15:29:43.893 NotebookApp] Exception in callback <bound method WebSocketMixin.send_ping of ZMQChannelsHandler(xxxxxxxxxxxxxxxxxxxxxxxxx)> Traceback (most recent call last): File "/home/usr/miniconda3/envs/scenicplus/lib/python3.8/site-packages/tornado/ioloop.py", line 921, in _run val = self.callback() File "/home/usr/miniconda3/envs/scenicplus/lib/python3.8/site-packages/notebook/base/zmqhandlers.py", line 188, in send_ping self.ping(b'') File "/home/usr/miniconda3/envs/scenicplus/lib/python3.8/site-packages/tornado/websocket.py", line 445, in ping self.ws_connection.write_ping(data) File "/home/usr/miniconda3/envs/scenicplus/lib/python3.8/site-packages/tornado/websocket.py", line 1101, in write_ping self._write_frame(True, 0x9, data) File "/home/usr/miniconda3/envs/scenicplus/lib/python3.8/site-packages/tornado/websocket.py", line 1061, in _write_frame return self.stream.write(frame) File "/home/usr/miniconda3/envs/scenicplus/lib/python3.8/site-packages/tornado/iostream.py", line 540, in write self._write_buffer.append(data) File "/home/usr/miniconda3/envs/scenicplus/lib/python3.8/site-packages/tornado/iostream.py", line 157, in append b += data # type: ignore BufferError: Existing exports of data: object cannot be re-sized

After checking online, I read that these might be due to very long outputs, such as the logs that Scenicplus shows below the cells (285621 characters in this case for example).

To be noted that the program was stuck at: "Processing: negative r2g, Top 10 region-to-gene links per gene: 16%|█▉ | 399/2532 [00:29<02:19, 15.31it/s]"

Any idea how to avoid this crash?

Thanks!

SeppeDeWinter commented 1 year ago

Hi @benduc

from "Processing: negative r2g, Top 10 region-to-gene links per gene: 16%|█▉ | 399/2532 [00:29<02:19, 15.31it/s]" I can see that the scenic+ run has almost completed.

Did you manage to save the scenicplus_obj? If so you can just restart the run and it will cary on from where it was left.

You could consider capturing the output of SCENIC+ like this, https://notebook.community/lifeinoppo/littlefishlet-scode/RES/REF/python_sourcecode/ipython-master/examples/IPython%20Kernel/Capturing%20Output.

This might solve the issue of long outputs.

Best,

Seppe

benduc commented 1 year ago

Hi @SeppeDeWinter

Sorry for my late reply. After digging into my data, I think that the issue was rather due to the bug discussed in #87. I had empty entries in region_sets['DARs'] when I tried to apply this to my own data, which I removed using the code that you proposed:

for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    if len(regions) > 0:
        region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))

Then everything ran smoothly! Concerning the very long outputs, I found that just hiding the Jupyter lab window and locking the screen greatly reduces the length of the output!

Thanks a lot for your help!

Ben