flatironinstitute / mountainsort5

MountainSort spike sorting algorithm, version 5
Apache License 2.0
34 stars 8 forks source link

Error With Temporary Directory #42

Closed JacobKelley101 closed 3 months ago

JacobKelley101 commented 3 months ago

Hello, new SI/MS5 user here. I have succesfully installed SI and MS5 but cannot get the example code to run. The error is currently coming from the creation of Temporary Directories. I have included my code and the error below. Any advice as to how I can resolve this would be greatly appreciated! It seems like a small fix but I am not very experienced with Python. I checked the directory that the error claims is invalid, and it exists.

CODE

from tempfile import TemporaryDirectory from pathlib import Path import numpy as np import spikeinterface as si import spikeinterface.preprocessing as spre import mountainsort5 as ms5 from mountainsort5.util import create_cached_recording

file_path = Path(r"C:\Users\kelley.j\Desktop\SI_Tests_Folder\SI_BZAData.bin") assert file_path.is_file(), f"Error: {file_path} is not a valid file. Please check the path." sampling_frequency = 15000.0 # Adjust according to your MATLAB dataset num_channels = 16 # Adjust according to your MATLAB dataset dtype = "float64" # MATLAB's double corresponds to Python's float64

recording = si.read_binary(file_paths=file_path, sampling_frequency=sampling_frequency, num_channels=num_channels, dtype=dtype)

recording_filtered = spre.bandpass_filter(recording, freq_min=300, freq_max=6000, dtype=np.float32) recording_preprocessed: si.BaseRecording = spre.whiten(recording_filtered)

base_dir = r'C:\Users\kelley.j\Desktop\SI_Tests_Folder' with TemporaryDirectory() as tmpdir:

cache the recording to a temporary directory for efficient reading

recording_cached = create_cached_recording(recording_preprocessed, folder=tmpdir)

sorting = ms5.sorting_scheme1(
    recording=recording_cached,
    sorting_parameters=ms5.Scheme1SortingParameters(...)
 )

ERROR

Traceback (most recent call last) Cell In[1], line 34 32 with TemporaryDirectory() as tmpdir: 33 # cache the recording to a temporary directory for efficient reading ---> 34 recording_cached = create_cached_recording(recording_preprocessed, folder=tmpdir) 36 # use scheme 1

File ~\anaconda3\envs\spikeinterface_env\lib\site-packages\mountainsort5\util\create_cached_recording.py:32, in create_cached_recording(recording, folder, n_jobs, chunk_duration) 25 ret = si.BinaryRecordingExtractor( 26 file_paths=[fname], 27 sampling_frequency=recording.get_sampling_frequency(), (...) 30 dtype='float32' 31 ) ---> 32 ret.set_channel_locations(recording.get_channel_locations()) 33 return ret

File ~\Documents\GitHub\spikeinterface\src\spikeinterface\core\baserecordingsnippets.py:365, in BaseRecordingSnippets.get_channel_locations(self, channel_ids, axes) 364 if locations is None: --> 365 raise Exception("There are no channel locations") 366 locations = np.asarray(locations)[channel_indices]

Exception: There are no channel locations

During handling of the above exception, another exception occurred:

PermissionError Traceback (most recent call last) File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:618, in _rmtree_unsafe(path, onerror) 617 try: --> 618 os.unlink(fullname) 619 except OSError:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\kelley.j\AppData\Local\Temp\tmpk6ntajbz\recording.dat'

During handling of the above exception, another exception occurred:

PermissionError Traceback (most recent call last) File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:852, in TemporaryDirectory._rmtree..onerror(func, path, exc_info) 851 try: --> 852 _os.unlink(path) 853 # PermissionError is raised on FreeBSD for directories

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\kelley.j\AppData\Local\Temp\tmpk6ntajbz\recording.dat'

During handling of the above exception, another exception occurred:

NotADirectoryError Traceback (most recent call last) Cell In[1], line 32 29 recording_preprocessed: si.BaseRecording = spre.whiten(recording_filtered) 31 base_dir = r'C:\Users\kelley.j\Desktop\SI_Tests_Folder' ---> 32 with TemporaryDirectory() as tmpdir: 33 # cache the recording to a temporary directory for efficient reading 34 recording_cached = create_cached_recording(recording_preprocessed, folder=tmpdir) 36 # use scheme 1

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:878, in TemporaryDirectory.exit(self, exc, value, tb) 877 def exit(self, exc, value, tb): --> 878 self.cleanup()

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:882, in TemporaryDirectory.cleanup(self) 880 def cleanup(self): 881 if self._finalizer.detach() or _os.path.exists(self.name): --> 882 self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:864, in TemporaryDirectory._rmtree(cls, name, ignore_errors) 861 if not ignore_errors: 862 raise --> 864 _shutil.rmtree(name, onerror=onerror)

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:750, in rmtree(path, ignore_errors, onerror) 748 # can't continue even if onerror hook returns 749 return --> 750 return _rmtree_unsafe(path, onerror)

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:620, in _rmtree_unsafe(path, onerror) 618 os.unlink(fullname) 619 except OSError: --> 620 onerror(os.unlink, fullname, sys.exc_info()) 621 try: 622 os.rmdir(path)

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:855, in TemporaryDirectory._rmtree..onerror(func, path, exc_info) 853 # PermissionError is raised on FreeBSD for directories 854 except (IsADirectoryError, PermissionError): --> 855 cls._rmtree(path, ignore_errors=ignore_errors) 856 except FileNotFoundError: 857 pass

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:864, in TemporaryDirectory._rmtree(cls, name, ignore_errors) 861 if not ignore_errors: 862 raise --> 864 _shutil.rmtree(name, onerror=onerror)

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:750, in rmtree(path, ignore_errors, onerror) 748 # can't continue even if onerror hook returns 749 return --> 750 return _rmtree_unsafe(path, onerror)

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:601, in _rmtree_unsafe(path, onerror) 599 entries = list(scandir_it) 600 except OSError: --> 601 onerror(os.scandir, path, sys.exc_info()) 602 entries = [] 603 for entry in entries:

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:598, in _rmtree_unsafe(path, onerror) 596 def _rmtree_unsafe(path, onerror): 597 try: --> 598 with os.scandir(path) as scandir_it: 599 entries = list(scandir_it) 600 except OSError:

NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\Users\kelley.j\AppData\Local\Temp\tmpk6ntajbz\recording.dat'

magland commented 3 months ago

@alejoe91

What did we decide was the simplest solution to this windows temporary directory cleanup issue. Should we change SI wrapper to simply not use python's TemporaryDirectory()?

alejoe91 commented 3 months ago

@magland this is fixed in the latest releases of spikeinterface (>=0.100.7)

The solution is to write to the output folder, with an option (true by default) to delete the cached recording when spike sorting finishes: https://github.com/SpikeInterface/spikeinterface/blob/0.100.8/src/spikeinterface/sorters/external/mountainsort5.py#L187-L206

magland commented 3 months ago

Thanks @alejoe91

@JacobKelley101 which version of SI are you using?

pip list | grep spikeinterface
JacobKelley101 commented 3 months ago

In my Jupyter Notebook I run:

import spikeinterface.full as si print(f"SpikeInterface version: {si.version}")

Resulting in a print of : "SpikeInterface version: 0.101.0rc0"

alejoe91 commented 3 months ago

@JacobKelley101 try to refresh your notebook. The line triggering the error is not there in version 0.010.0rc0

https://github.com/SpikeInterface/spikeinterface/blob/0.101.0rc0/src/spikeinterface/sorters/external/mountainsort5.py#L189

JacobKelley101 commented 3 months ago

@alejoe91 I hadn't found the solution proposed in the link you attached. I will add it in and see if it works better!

alejoe91 commented 3 months ago

@JacobKelley101 my link was just to show that the released version that you have should not trigger the error you're seeing, since the line recording_cached = create_cached_recording(recording_preprocessed, folder=tmpdir) has been removed.

Can you try to make sure you're running your notebook with the 0.101.rc0 version?

JacobKelley101 commented 3 months ago

@alejoe91 I ran the code again and printed my imported spikeinterface version before calling the temporary directory function. Mountainsort will run, but the temporary folder still can't be deleted. I attacthed the code below again.

### Code from tempfile import TemporaryDirectory from pathlib import Path import numpy as np import spikeinterface as si import spikeinterface.preprocessing as spre import mountainsort5 as ms5 from mountainsort5.util import create_cached_recording print(f"SpikeInterface version: {si.version}")

Load recording object

file_path = Path(r"C:\Users\kelley.j\Desktop\SI_Tests_Folder\SI_BZAData.bin") assert file_path.is_file(), f"Error: {file_path} is not a valid file. Please check the path." sampling_frequency = 15000.0 # Adjust according to your MATLAB dataset num_channels = 16 # Adjust according to your MATLAB dataset dtype = "float64" # MATLAB's double corresponds to Python's float64

Load data using SpikeInterface

recording = si.read_binary(file_paths=file_path, sampling_frequency=sampling_frequency, num_channels=num_channels, dtype=dtype)

Set channel locations (example: evenly spaced along a line)

channel_locations = np.column_stack((np.arange(num_channels), np.zeros(num_channels))) recording.set_channel_locations(channel_locations)

Preprocess the recording

recording_filtered = spre.bandpass_filter(recording, freq_min=300, freq_max=6000, dtype=np.float32) recording_preprocessed = spre.whiten(recording_filtered)

Ensure the temporary directory is used correctly

base_dir = r'C:\Users\kelley.j\Desktop\SI_Tests_Folder' with TemporaryDirectory(dir=base_dir) as tmpdir:

Cache the recording to a temporary directory for efficient reading

recording_cached = create_cached_recording(recording_preprocessed, folder=tmpdir)

# Use scheme 1 (adjust sorting parameters as needed)
sorting_parameters = ms5.Scheme1SortingParameters()  # Use default or specify parameters
sorting = ms5.sorting_scheme1(recording=recording_cached, sorting_parameters=sorting_parameters)

# Other schemes can be used similarly
# sorting = ms5.sorting_scheme2(recording=recording_cached, sorting_parameters=ms5.Scheme2SortingParameters(...))
# sorting = ms5.sorting_scheme3(recording=recording_cached, sorting_parameters=ms5.Scheme3SortingParameters(...))

Error/Printout

SpikeInterface version: 0.101.0rc0

write_binary_recording: 100%  601/601 [00:05<00:00, 122.72it/s]

Number of channels: 16 Number of timepoints: 9000900 Sampling frequency: 15000.0 Hz Channel 0: [0. 0.] Channel 1: [1. 0.] Channel 2: [2. 0.] Channel 3: [3. 0.] Channel 4: [4. 0.] Channel 5: [5. 0.] Channel 6: [6. 0.] Channel 7: [7. 0.] Channel 8: [8. 0.] Channel 9: [9. 0.] Channel 10: [10. 0.] Channel 11: [11. 0.] Channel 12: [12. 0.] Channel 13: [13. 0.] Channel 14: [14. 0.] Channel 15: [15. 0.] Loading traces MS5 Elapsed time for load_traces: 0.000 seconds Detecting spikes

Adjacency for detect spikes with channel radius None [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]]

m = 0 (nbhd size: 16) m = 1 (nbhd size: 16) m = 2 (nbhd size: 16) m = 3 (nbhd size: 16) m = 4 (nbhd size: 16) m = 5 (nbhd size: 16) m = 6 (nbhd size: 16) m = 7 (nbhd size: 16) m = 8 (nbhd size: 16) m = 9 (nbhd size: 16) m = 10 (nbhd size: 16) m = 11 (nbhd size: 16) m = 12 (nbhd size: 16) m = 13 (nbhd size: 16) m = 14 (nbhd size: 16) m = 15 (nbhd size: 16) Detected 179 spikes MS5 Elapsed time for detect_spikes: 0.486 seconds Removing duplicate times MS5 Elapsed time for remove_duplicate_times: 0.000 seconds Extracting 179 snippets MS5 Elapsed time for extract_snippets: 0.004 seconds Computing PCA features with npca=48 MS5 Elapsed time for compute_pca_features: 0.044 seconds Isosplit6 clustering with npca_per_subdivision=10 Found 1 clusters MS5 Elapsed time for isosplit6_subdivision_method: 0.002 seconds Computing templates MS5 Elapsed time for compute_templates: 0.003 seconds Determining optimal alignment of templates Template alignment converged. Align templates offsets: [0] MS5 Elapsed time for align_templates: 0.001 seconds Aligning snippets MS5 Elapsed time for align_snippets: 0.000 seconds Clustering aligned snippets Computing PCA features with npca=48 MS5 Elapsed time for compute_pca_features: 0.005 seconds Isosplit6 clustering with npca_per_subdivision=10 MS5 Elapsed time for isosplit6_subdivision_method: 0.004 seconds Found 1 clusters after alignment Computing templates MS5 Elapsed time for compute_templates: 0.000 seconds Offsetting times to peak Offsets to peak: [0] MS5 Elapsed time for determine_offsets_to_peak: 0.000 seconds Sorting times MS5 Elapsed time for sorting times: 0.000 seconds Removing out of bounds times MS5 Elapsed time for removing out of bounds times: 0.000 seconds Reordering units MS5 Elapsed time for reordering units: 0.000 seconds Creating sorting object MS5 Elapsed time for creating sorting object: 0.002 seconds


PermissionError Traceback (most recent call last) File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:618, in _rmtree_unsafe(path, onerror) 617 try: --> 618 os.unlink(fullname) 619 except OSError:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\kelley.j\Desktop\SI_Tests_Folder\tmp8jkh2gxo\recording.dat'

During handling of the above exception, another exception occurred:

PermissionError Traceback (most recent call last) File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:852, in TemporaryDirectory._rmtree..onerror(func, path, exc_info) 851 try: --> 852 _os.unlink(path) 853 # PermissionError is raised on FreeBSD for directories

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\kelley.j\Desktop\SI_Tests_Folder\tmp8jkh2gxo\recording.dat'

During handling of the above exception, another exception occurred:

NotADirectoryError Traceback (most recent call last) Cell In[1], line 30 28 # Ensure the temporary directory is used correctly 29 base_dir = r'C:\Users\kelley.j\Desktop\SI_Tests_Folder' ---> 30 with TemporaryDirectory(dir=base_dir) as tmpdir: 31 # Cache the recording to a temporary directory for efficient reading 32 recording_cached = create_cached_recording(recording_preprocessed, folder=tmpdir) 34 # Use scheme 1 (adjust sorting parameters as needed)

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:878, in TemporaryDirectory.exit(self, exc, value, tb) 877 def exit(self, exc, value, tb): --> 878 self.cleanup()

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:882, in TemporaryDirectory.cleanup(self) 880 def cleanup(self): 881 if self._finalizer.detach() or _os.path.exists(self.name): --> 882 self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:864, in TemporaryDirectory._rmtree(cls, name, ignore_errors) 861 if not ignore_errors: 862 raise --> 864 _shutil.rmtree(name, onerror=onerror)

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:750, in rmtree(path, ignore_errors, onerror) 748 # can't continue even if onerror hook returns 749 return --> 750 return _rmtree_unsafe(path, onerror)

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:620, in _rmtree_unsafe(path, onerror) 618 os.unlink(fullname) 619 except OSError: --> 620 onerror(os.unlink, fullname, sys.exc_info()) 621 try: 622 os.rmdir(path)

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:855, in TemporaryDirectory._rmtree..onerror(func, path, exc_info) 853 # PermissionError is raised on FreeBSD for directories 854 except (IsADirectoryError, PermissionError): --> 855 cls._rmtree(path, ignore_errors=ignore_errors) 856 except FileNotFoundError: 857 pass

File ~\anaconda3\envs\spikeinterface_env\lib\tempfile.py:864, in TemporaryDirectory._rmtree(cls, name, ignore_errors) 861 if not ignore_errors: 862 raise --> 864 _shutil.rmtree(name, onerror=onerror)

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:750, in rmtree(path, ignore_errors, onerror) 748 # can't continue even if onerror hook returns 749 return --> 750 return _rmtree_unsafe(path, onerror)

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:601, in _rmtree_unsafe(path, onerror) 599 entries = list(scandir_it) 600 except OSError: --> 601 onerror(os.scandir, path, sys.exc_info()) 602 entries = [] 603 for entry in entries:

File ~\anaconda3\envs\spikeinterface_env\lib\shutil.py:598, in _rmtree_unsafe(path, onerror) 596 def _rmtree_unsafe(path, onerror): 597 try: --> 598 with os.scandir(path) as scandir_it: 599 entries = list(scandir_it) 600 except OSError:

NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\Users\kelley.j\Desktop\SI_Tests_Folder\tmp8jkh2gxo\recording.dat'

alejoe91 commented 3 months ago

Ah I see! the error is not from SpikeInterface!

You should avoid using these lines:

with TemporaryDirectory(dir=base_dir) as tmpdir:
# Cache the recording to a temporary directory for efficient reading
recording_cached = create_cached_recording(recording_preprocessed, folder=tmpdir)

and use these instead:

if not recording_preprocessed.is_binary_compatible():
    recording_cached = recording_preprocessed.save(folder="some_outout_folder"), n_jobs=-1) #don't use a TMP folder!!
else:
    recording_cached = recording_preprocessed

Is there a reason why you're not running the sorting directly through spikeinterface?

JacobKelley101 commented 3 months ago

I was just trying to follow the example code provided on the MountainSort GitHub page. https://github.com/flatironinstitute/mountainsort5/tree/main

I replaced the lines you suggested and it is running without errors now. May I ask why it is not recommended to use the Temporary Directory functions, despite being written in the MountainSort GitHub? Also, I thought I was running this through spikeinterface, but I could be wrong. Could you give an example of what that would look like? Apologies, this is all very new.

alejoe91 commented 3 months ago

No worries! I think that the MS5 README is not up to date ;)

Using the TMP directory has proven quite problematic on windows...

To run directly in SpikeInterface is as simple as:

import spikeinterface.sorters as ss

sorting_ms5 = ss.run_sorter("mountainsort5", recording=recording_preprocessed, folder="some-output-folder")

Caching is handled automatically!

You can find a list of supported arguments with:

ss.get_default_sorter_params("mountainsort5")

Any of these can be passed as additional keyword argument to the function.

JacobKelley101 commented 3 months ago

awesome, thank you very much @alejoe91 and @magland! I will close this issue. At this point, any further issues I have with SI or MS5 probably will not pertain to my original query.