Closed Antho2422 closed 9 months ago
Can you try to use absolute paths instead of relative ones? Can you also paste the full script?
Hi, I tried using the abs path instead but same issue.
Here is the code I used :
@staticmethod
def run_sorting(
base_recording: dict[int, BaseRecording | BaseExtractor],
sorter_list: list[str],
params: dict[str, Any],
output_path: str,
):
"""
This function run a sorter with all parameters set in config
and save them
"""
logger.info(f'Running {str(sorter_list)}')
start = time.time()
sorting = ss.run_sorters(
sorter_list,
base_recording,
working_folder=os.path.join(
output_path,
'sorting',
),
mode_if_folder_exists='overwrite',
sorter_params=params,
verbose=True,
)
end = time.time()
logger.info(f'Sorting done. The whole process took {str(end - start)}')
return sorting
I have this issue only when I'm trying to save the results on a different disk from where the data is coming from I guess.
@alejoe91 This is because of this I think https://github.com/SpikeInterface/spikeinterface/blob/892c93dedcab574b932dcf9fba19e84417062ae0/src/spikeinterface/sorters/basesorter.py#L144
Same here. Not clear to me how I should behave if I want to save data in a different location from source.
@alejoe91 @h-mayorquin @vigji @Antho2422 we switched to always save the rec file with relative mode.But for windows with drive letter this is a bad idea. Maybe we should make an os dependant here https://github.com/SpikeInterface/spikeinterface/blob/892c93dedcab574b932dcf9fba19e84417062ae0/src/spikeinterface/sorters/basesorter.py#L144
if linux:
recording.dump_to_json(rec_file, relative_to=output_folder)
elif windows:
recording.dump_to_json(rec_file)
any volunter for a patch ?
Can give ita try (I'll probably be slow and come back with questions if it not an issue!)
I think that having different and untested behavior for this in a different OS is a bad idea. We should either test in both CIs or at least keep the same behavior in different OSes.
Hi, I am having similar error on windows when trying to run a sorter, I was wondering if there is any update on this ? Please let me know! Thanks a lot! :)
@zm711
I did an attempt at running this but I am not sure I can reproduce the error that OP published as he never shared the full script.
Can you share a script that makes you get an error?
I either did something like this trying to use two different drives (+ the Kilosort drive)
save_path = Path(r'D:\data)
rec = si.read_xxx(r'E:\path\to\file') # not sure what files you have locally that you want to test, so you could read whatever
# also I set the probe to the recording, but again you could probably try whatever
ss.set_kilosort3_path(r'C:\zm\documents\kilosort-main')
ss.run_sorter('kilosort3', rec, output_folder=save_path / 'sorter_kilo')
or something like this:
save_path = Path(r'E:\path\to\file')
rec = si.read_xxx(save_path)
ss.set_kilosort3_path(r'C:\zm\documents\kilosort-main')
ss.run_sorter('kilosort3', rec, output_folder = save_path / 'sorter_kilo')
This gave the error above. So I just moved the read
, sorter_path
and output_folder
to all be on the C drive and it worked after that. So to be honest I'm not sure if the problem is that the sorter location is on a different drive or whether the problem is the raw file and the save path are different. If you can't reproduce it I know @alejoe91 did some updates in #2141 (so maybe that helped with Windows too?--not sure).
Does that make sense for a minimal test, @h-mayorquin?
The guilty is theses lines https://github.com/SpikeInterface/spikeinterface/blob/f725a406f3cc16452636f5bf0652a5d541571be2/src/spikeinterface/sorters/basesorter.py#L139
On windows when differents drive this do not work. I think we recently force the "relative to" to save the recording.json (or pickle) inside the sorter folder. This globally a good idea when the data are moving when for the window letter drive this is clearly a bad choice. Maybe we should change this with explicit or inplcit params.
Okay, so I figured out another interesting issue related to this. So running mountainsort5 from my storage drive worked fine (D:/zach
). But if I tried to do it from our storage server it failed. But the issue is that the using Path returns something like
r'z:/server_name/zach/data'
but when spikeinterface internally tries to unpack the path it returns
'\\\\server_name\zach\data'
and says that the former is not equal to the latter.
The call triggering the different file name is
func = lambda p: os.path.relpath(str(p), start=relative)
Sorry for not updating earlier, @zm711 . This has been a very busy week for me. I came across that issue in my last weekend attempt.
Local paths are handled fine in windows (both to me and @samuelgarcia 's suprirse but to @alejoe91 credit who I think built this).
But there are two problems:
I will make another attempt at this by the end of the week. Do you know how to mock network paths? I don't have access to this functionality but maybe it can be done easily (never had to deal with this before)
So there are 4 cases to test and sort out:
But there are two problems: Mounting volumes in docker when they are local paths but no in the same drive. The problem with relative paths for network drives that you mentioned.
I think I figured part of it out. So if I input the file using UNC instead:
so,
unc_path = r'//server_name/zach/data'
dir_path = Path(unc_path)
instead of
dir_path = Path(r'z:/zach')
then the paths match up. I'm currently sorting on a network drive with no problems (other than speed but who is counting).
So for the network drives we either tell Windows users via documentation that they must enter the network path as UNC rather than with the drive name or we need a helper function to automatically help fix it for them.
Do you know how to mock network paths? I don't have access to this functionality but maybe it can be done easily (never had to deal with this before)
Not a clue. I do my testing on Ubuntu for GH and locally for Windows. So I never need to mock windows drives.
Do I understand well that in Windows network paths might appear as local paths and the user might not know about it? I was hoping that we could solve this automatically by using the root or anchor methods in pathlib but a shallow preliminary reading indicates it might not be the case.
Yeah, based on my reading most users "map their drives" so the network would be something like:
//0.0.0.0/name/something
but Windows let you map it to the drive nomenclature so that the above appears as
z:/name/something
But if the user puts in the drive with //
it has to be the forward slashes based on this stackoverflow. So if the user puts in the correct network drive rather than the mapping then SpikeInterface doesn't fail when sorting over a network drive. Does that make sense?
We are going to have dueling stack overflow.
I'll read yours. I had just finished typing when you hit enter.
When I switch to wsl, I can only see my local drives mounted on /mnt
I cannot see my network drives from there. But in the windows finder my network drive is easy to find with its drive alias.
So if the user puts in the correct network drive rather than the mapping then SpikeInterface doesn't fail when sorting over a network drive. Does that make sense?
This makes sense and is already a solution. I hope that we.cam automatize it by using subprocess to call some command available on windows like 'net use' so user's don't even have to think about it.
That is for normal relative paths though. I am not sure if they will work from within docker as it is not clear wsl will be able to mount volumes.
Sorry, yeah I don't do any container stuff, so my conception of getting Docker/ Aptainer to work is non-existent. Sorry I was only thinking of the local.
No worries. This is already a solution for your problem right? That's what I was hoping we could solve on the first place : D
The rest is a bonus..
This fixes network to network drives, but the OPs question was actually going from C: -> network. I just tested my solution and it doesn't work for that.
And even local drive to local drive doesn't work for me: If you try
rec = read_xxx(r'c:/data')
save_path=Path(r'd:/save')
ss.run_sorter(.... working_folder=save_path)
It fails for me even if it is local drive to local drive.
ValueError: path is on mount 'C:', start on mount 'D:'
What did you and Sam get to work locally specifically?
My summary is: c -> c fine d -> d fine (even if running matlab from c) c -> d fails c -> network drive fails network -> network fine
No I haven't tested c -> d. My intuition was it should fails. Alessio was more confident.
I tested c to d. I will double check.
One thing we should double check is: I'm running my tests on Windows 10. Are you 10 or 11? Windows changed some things with connectivity like com ports etc between the versions so they might actually behave differently enough that we might not be doing the same thing.
I am running on windows 10. C -> D (when D is local (not network) and without docker works fine in my system).
We are discussing two issues here. Sam is discussing that there might be an error in the function that transforms relative paths. My claim is that it does not because the following does work in my system.
from spikeinterface.core.datasets import download_dataset
from spikeinterface.extractors.neoextractors.mearec import (
MEArecRecordingExtractor
)
from spikeinterface.core import load_extractor
local_file_path = download_dataset(remote_path="mearec/mearec_test_10s.h5")
recording = MEArecRecordingExtractor(local_file_path)
file_path = "C:\\Users\\heberto\\spikeinterface_datasets\\ephy_testing_data\\mearec\\mearec_test_10s.h5"
recording = MEArecRecordingExtractor(file_path=file_path)
MEArecRecordingExtractor: 32 channels - 32.0kHz - 1 segments - 320,000 samples - 10.00s
float32 dtype - 39.06 MiB
file_path: [C:\Users\heberto\spikeinterface_datasets\ephy_testing_data\mearec\mearec_test_10s.h5](file:///C:/Users/heberto/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5)
folder_path = Path("D:\\folder_to_write_stuff")
folder_path.is_dir(), folder_path.anchor, folder_path.root
recording.save(folder=folder_path, overwrite=True)
write_binary_recording with n_jobs = 1 and chunk_size = 32000
write_binary_recording: 100%|##########| 10/10 [00:02<00:00, 3.67it/s]
BinaryFolderRecording: 32 channels - 32.0kHz - 1 segments - 320,000 samples - 10.00s
float32 dtype - 39.06 MiB
import os
folder_path = "D:\\folder_to_write_stuff"
print(os.listdir(folder_path))
['provenance.json', 'probe.json', 'properties', 'traces_cached_seg0.raw', 'binary.json', 'si_folder.json']
The error that you are seeing is a problem of mounting volumes in docker. But this is a problem that with the docker mechanis that we use to mount volumes, not with the machinery for resolving relative paths.
Let's open another issue with your (extended) table as this is might be too large of a scope and it would be good to discuss exactly what we want to support and why.
Writing a normal extractor (as above) to-from:
1. c -> c
2. d -> d
3. c -> d
4. c -> network drive
network -> network
Writing from docker to-from:
5. c -> c fine
6. d -> d fine (even if running matlab from c)
7. c -> d fails
8. c -> network drive fails
network -> network fine
@h-mayorquin ,
happy to open a new issue, but I think we are discussing slightly different things. The function I'm testing
is run_sorter
not save
.
When you get a moment could you run/try:
from spikeinterface.core.datasets import download_dataset
from spikeinterface.extractors.neoextractors.mearec import (
MEArecRecordingExtractor
)
from spikeinterface.core import load_extractor
from spikeinterface.sorters import run_sorter
local_file_path = download_dataset(remote_path="mearec/mearec_test_10s.h5")
recording = MEArecRecordingExtractor(local_file_path)
file_path = "C:\\Users\\heberto\\spikeinterface_datasets\\ephy_testing_data\\mearec\\mearec_test_10s.h5"
recording = MEArecRecordingExtractor(file_path=file_path)
MEArecRecordingExtractor: 32 channels - 32.0kHz - 1 segments - 320,000 samples - 10.00s
float32 dtype - 39.06 MiB
file_path: [C:\Users\heberto\spikeinterface_datasets\ephy_testing_data\mearec\mearec_test_10s.h5](file:///C:/Users/heberto/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5)
folder_path = Path("D:\\folder_to_write_stuff")
folder_path.is_dir(), folder_path.anchor, folder_path.root
run_sorter(****, recording=recording, working_folder=folder_path) # Not sure which sorter you want to test locally
Thanks guys for looking into this annoying issue!
I agree that run_sorter
is more critical, because inside we have dumping to json with relative_to. I believe that is what's causing the issue
@zm711 @alejoe91
What I am saying is that the fact that the save functionality works implies that the problem is with the mounting volume mechanism of docker and NOT with the machinery to resolve relative paths.
That is, I know that the script you shared with me above fails. I am saying it does not fail for the reason that Sam described above which is the _make_paths_relative
function. Otherwise, normal save would not work.
@h-mayorquin I see your point. I misunderstood. I thought you were either saying it did work or that you were only testing save. But you were using save to probe the function. Cool.
Could you re-explain, if I run_sorter
locally it is still failing though. I'm not doing any docker and c-> d fails. There is currently no docker on my system. So something is failing with the local mounting of the drives. So you think there are still two separate problems 1) windows local 2) docker mounting on windows? I just want all the info if I'm going to open new more specific issues.
Ah, that's good. I was meaning to test the function run_sorter
without the usage of docker as my next step. Thanks for doing that.
I will take a look.
I have meetings the rest of the morning (over here), but I will open new issues this afternoon. I see there being 3 for us to explore (maybe 2 and 3 are the same, but I will open separate issues and link them):
1) network drive mapping in general (as we discussed above) 2) local drive to local drive in run_sorter 3) local drive to docker in run_sorter
This error has been fixed with the new dumping check mechanism. There is a new problem with pickling that can occur when just using the run_sorter
, but run_sorter_by_property
works fine. I'll link this issue to a new issue that needs to be worked on but this specific error should no longer occur.
Hi all,
I'm currently trying to run spykingcircus2 via the function run_sorters().
I'm running the script on a Windows PC and save results on another drive. I get this error message :
Here is the full error message :
Any clue on what do I have this issue ? I was doing the same thing (I believe) on version 0.98.0.dev0 and it was working correctly.
Thank you,
Anthony