SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
497 stars 187 forks source link

Error run_sorters() : ValueError: path is on mount 'C:', start on mount #1891

Closed Antho2422 closed 9 months ago

Antho2422 commented 1 year ago

Hi all,

I'm currently trying to run spykingcircus2 via the function run_sorters().

I'm running the script on a Windows PC and save results on another drive. I get this error message :

Failed: [undefined]ValueError: path is on mount 'C:', start on mount '\\\\l2export\\iss02.charpier'

Here is the full error message :

..\..\..\AppData\Local\pypoetry\Cache\virtualenvs\priam-5spsvreu-py3.9\lib\site-packages\spikeinterface\core\base.py:942: in <lambda>
    func = lambda p: os.path.relpath(str(p), start=relative)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = 'C:\\Users\\anthony.pinto\\Documents\\git\\priam\\data\\raw\\test_model_v0.1\\03046_2021-07-09_12-02\\base_recording'
start = '\\\\l2export\\iss02.charpier\\priam\\test_suite\\outputs\\session_test\\pat_03046_1482\\eeg\\test_sorting\\sorting\\0\\spykingcircus2'

    def relpath(path, start=None):
        """Return a relative version of a path"""
        path = os.fspath(path)
        if isinstance(path, bytes):
            sep = b'\\'
            curdir = b'.'
            pardir = b'..'
        else:
            sep = '\\'
            curdir = '.'
            pardir = '..'

        if start is None:
            start = curdir

        if not path:
            raise ValueError("no path specified")

        start = os.fspath(start)
        try:
            start_abs = abspath(normpath(start))
            path_abs = abspath(normpath(path))
            start_drive, start_rest = splitdrive(start_abs)
            path_drive, path_rest = splitdrive(path_abs)
            if normcase(start_drive) != normcase(path_drive):
>               raise ValueError("path is on mount %r, start on mount %r" % (
                    path_drive, start_drive))
E                   ValueError: path is on mount 'C:', start on mount '\\\\l2export\\iss02.charpier'

C:\Program Files\Python39\lib\ntpath.py:703: ValueError

Any clue on what do I have this issue ? I was doing the same thing (I believe) on version 0.98.0.dev0 and it was working correctly.

Thank you,

Anthony

alejoe91 commented 1 year ago

Can you try to use absolute paths instead of relative ones? Can you also paste the full script?

Antho2422 commented 1 year ago

Hi, I tried using the abs path instead but same issue.

Here is the code I used :

    @staticmethod
    def run_sorting(
        base_recording: dict[int, BaseRecording | BaseExtractor],
        sorter_list: list[str],
        params: dict[str, Any],
        output_path: str,
    ):
        """
        This function run a sorter with all parameters set in config
        and save them
        """
        logger.info(f'Running {str(sorter_list)}')
        start = time.time()
        sorting = ss.run_sorters(
            sorter_list,
            base_recording,
            working_folder=os.path.join(
                output_path,
                'sorting',
            ),
            mode_if_folder_exists='overwrite',
            sorter_params=params,
            verbose=True,
        )
        end = time.time()
        logger.info(f'Sorting done. The whole process took {str(end - start)}')
        return sorting
Antho2422 commented 1 year ago

I have this issue only when I'm trying to save the results on a different disk from where the data is coming from I guess.

samuelgarcia commented 1 year ago

@alejoe91 This is because of this I think https://github.com/SpikeInterface/spikeinterface/blob/892c93dedcab574b932dcf9fba19e84417062ae0/src/spikeinterface/sorters/basesorter.py#L144

vigji commented 12 months ago

Same here. Not clear to me how I should behave if I want to save data in a different location from source.

samuelgarcia commented 12 months ago

@alejoe91 @h-mayorquin @vigji @Antho2422 we switched to always save the rec file with relative mode.But for windows with drive letter this is a bad idea. Maybe we should make an os dependant here https://github.com/SpikeInterface/spikeinterface/blob/892c93dedcab574b932dcf9fba19e84417062ae0/src/spikeinterface/sorters/basesorter.py#L144

if linux:
    recording.dump_to_json(rec_file, relative_to=output_folder) 
elif windows:
    recording.dump_to_json(rec_file) 

any volunter for a patch ?

vigji commented 12 months ago

Can give ita try (I'll probably be slow and come back with questions if it not an issue!)

h-mayorquin commented 12 months ago

I think that having different and untested behavior for this in a different OS is a bad idea. We should either test in both CIs or at least keep the same behavior in different OSes.

brijeshmodi12 commented 11 months ago

Hi, I am having similar error on windows when trying to run a sorter, I was wondering if there is any update on this ? Please let me know! Thanks a lot! :)

h-mayorquin commented 10 months ago

@zm711

I did an attempt at running this but I am not sure I can reproduce the error that OP published as he never shared the full script.

Can you share a script that makes you get an error?

zm711 commented 10 months ago

I either did something like this trying to use two different drives (+ the Kilosort drive)

save_path = Path(r'D:\data)
rec = si.read_xxx(r'E:\path\to\file') # not sure what files you have locally that you want to test, so you could read whatever
# also I set the probe to the recording, but again you could probably try whatever
ss.set_kilosort3_path(r'C:\zm\documents\kilosort-main')
ss.run_sorter('kilosort3', rec, output_folder=save_path / 'sorter_kilo')

or something like this:

save_path = Path(r'E:\path\to\file')
rec = si.read_xxx(save_path)
ss.set_kilosort3_path(r'C:\zm\documents\kilosort-main')
ss.run_sorter('kilosort3', rec, output_folder = save_path / 'sorter_kilo')

This gave the error above. So I just moved the read, sorter_path and output_folder to all be on the C drive and it worked after that. So to be honest I'm not sure if the problem is that the sorter location is on a different drive or whether the problem is the raw file and the save path are different. If you can't reproduce it I know @alejoe91 did some updates in #2141 (so maybe that helped with Windows too?--not sure).

Does that make sense for a minimal test, @h-mayorquin?

samuelgarcia commented 10 months ago

The guilty is theses lines https://github.com/SpikeInterface/spikeinterface/blob/f725a406f3cc16452636f5bf0652a5d541571be2/src/spikeinterface/sorters/basesorter.py#L139

On windows when differents drive this do not work. I think we recently force the "relative to" to save the recording.json (or pickle) inside the sorter folder. This globally a good idea when the data are moving when for the window letter drive this is clearly a bad choice. Maybe we should change this with explicit or inplcit params.

zm711 commented 10 months ago

Okay, so I figured out another interesting issue related to this. So running mountainsort5 from my storage drive worked fine (D:/zach). But if I tried to do it from our storage server it failed. But the issue is that the using Path returns something like

r'z:/server_name/zach/data'

but when spikeinterface internally tries to unpack the path it returns

'\\\\server_name\zach\data'

and says that the former is not equal to the latter.

The call triggering the different file name is

func = lambda p: os.path.relpath(str(p), start=relative)

(https://github.com/SpikeInterface/spikeinterface/blob/f725a406f3cc16452636f5bf0652a5d541571be2/src/spikeinterface/core/base.py#L980-L983)

h-mayorquin commented 10 months ago

Sorry for not updating earlier, @zm711 . This has been a very busy week for me. I came across that issue in my last weekend attempt.

Local paths are handled fine in windows (both to me and @samuelgarcia 's suprirse but to @alejoe91 credit who I think built this).

But there are two problems:

I will make another attempt at this by the end of the week. Do you know how to mock network paths? I don't have access to this functionality but maybe it can be done easily (never had to deal with this before)

h-mayorquin commented 10 months ago

So there are 4 cases to test and sort out:

zm711 commented 10 months ago

But there are two problems: Mounting volumes in docker when they are local paths but no in the same drive. The problem with relative paths for network drives that you mentioned.

I think I figured part of it out. So if I input the file using UNC instead:

so,

unc_path = r'//server_name/zach/data'
dir_path = Path(unc_path)

instead of

dir_path = Path(r'z:/zach')

then the paths match up. I'm currently sorting on a network drive with no problems (other than speed but who is counting).

So for the network drives we either tell Windows users via documentation that they must enter the network path as UNC rather than with the drive name or we need a helper function to automatically help fix it for them.

Do you know how to mock network paths? I don't have access to this functionality but maybe it can be done easily (never had to deal with this before)

Not a clue. I do my testing on Ubuntu for GH and locally for Windows. So I never need to mock windows drives.

h-mayorquin commented 10 months ago

Do I understand well that in Windows network paths might appear as local paths and the user might not know about it? I was hoping that we could solve this automatically by using the root or anchor methods in pathlib but a shallow preliminary reading indicates it might not be the case.

h-mayorquin commented 10 months ago

@zm711 https://stackoverflow.com/questions/4491601/in-python-how-to-separate-local-hard-drives-from-network-and-floppy-in-windows

zm711 commented 10 months ago

Yeah, based on my reading most users "map their drives" so the network would be something like:

//0.0.0.0/name/something

but Windows let you map it to the drive nomenclature so that the above appears as

z:/name/something

But if the user puts in the drive with // it has to be the forward slashes based on this stackoverflow. So if the user puts in the correct network drive rather than the mapping then SpikeInterface doesn't fail when sorting over a network drive. Does that make sense?

We are going to have dueling stack overflow.

zm711 commented 10 months ago

I'll read yours. I had just finished typing when you hit enter.

zm711 commented 10 months ago

When I switch to wsl, I can only see my local drives mounted on /mnt I cannot see my network drives from there. But in the windows finder my network drive is easy to find with its drive alias.

h-mayorquin commented 10 months ago

So if the user puts in the correct network drive rather than the mapping then SpikeInterface doesn't fail when sorting over a network drive. Does that make sense?

This makes sense and is already a solution. I hope that we.cam automatize it by using subprocess to call some command available on windows like 'net use' so user's don't even have to think about it.

That is for normal relative paths though. I am not sure if they will work from within docker as it is not clear wsl will be able to mount volumes.

zm711 commented 10 months ago

Sorry, yeah I don't do any container stuff, so my conception of getting Docker/ Aptainer to work is non-existent. Sorry I was only thinking of the local.

h-mayorquin commented 10 months ago

No worries. This is already a solution for your problem right? That's what I was hoping we could solve on the first place : D

The rest is a bonus..

zm711 commented 10 months ago

This fixes network to network drives, but the OPs question was actually going from C: -> network. I just tested my solution and it doesn't work for that.

And even local drive to local drive doesn't work for me: If you try

rec = read_xxx(r'c:/data')
save_path=Path(r'd:/save')
ss.run_sorter(.... working_folder=save_path)

It fails for me even if it is local drive to local drive.

ValueError: path is on mount 'C:', start on mount 'D:'

What did you and Sam get to work locally specifically?

My summary is: c -> c fine d -> d fine (even if running matlab from c) c -> d fails c -> network drive fails network -> network fine

samuelgarcia commented 10 months ago

No I haven't tested c -> d. My intuition was it should fails. Alessio was more confident.

h-mayorquin commented 10 months ago

I tested c to d. I will double check.

zm711 commented 10 months ago

One thing we should double check is: I'm running my tests on Windows 10. Are you 10 or 11? Windows changed some things with connectivity like com ports etc between the versions so they might actually behave differently enough that we might not be doing the same thing.

h-mayorquin commented 10 months ago

I am running on windows 10. C -> D (when D is local (not network) and without docker works fine in my system).

We are discussing two issues here. Sam is discussing that there might be an error in the function that transforms relative paths. My claim is that it does not because the following does work in my system.


from spikeinterface.core.datasets import download_dataset
from spikeinterface.extractors.neoextractors.mearec import (
    MEArecRecordingExtractor
)
from spikeinterface.core import load_extractor

local_file_path = download_dataset(remote_path="mearec/mearec_test_10s.h5")
recording = MEArecRecordingExtractor(local_file_path)

file_path = "C:\\Users\\heberto\\spikeinterface_datasets\\ephy_testing_data\\mearec\\mearec_test_10s.h5"
recording = MEArecRecordingExtractor(file_path=file_path)

MEArecRecordingExtractor: 32 channels - 32.0kHz - 1 segments - 320,000 samples - 10.00s 
                          float32 dtype - 39.06 MiB
  file_path: [C:\Users\heberto\spikeinterface_datasets\ephy_testing_data\mearec\mearec_test_10s.h5](file:///C:/Users/heberto/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5)

folder_path = Path("D:\\folder_to_write_stuff")
folder_path.is_dir(), folder_path.anchor, folder_path.root

recording.save(folder=folder_path, overwrite=True)

write_binary_recording with n_jobs = 1 and chunk_size = 32000
write_binary_recording: 100%|##########| 10/10 [00:02<00:00,  3.67it/s]
BinaryFolderRecording: 32 channels - 32.0kHz - 1 segments - 320,000 samples - 10.00s 
                       float32 dtype - 39.06 MiB

import os

folder_path = "D:\\folder_to_write_stuff"
print(os.listdir(folder_path))

['provenance.json', 'probe.json', 'properties', 'traces_cached_seg0.raw', 'binary.json', 'si_folder.json']

The error that you are seeing is a problem of mounting volumes in docker. But this is a problem that with the docker mechanis that we use to mount volumes, not with the machinery for resolving relative paths.

Let's open another issue with your (extended) table as this is might be too large of a scope and it would be good to discuss exactly what we want to support and why.

Writing a normal extractor (as above) to-from:
1. c -> c 
2. d -> d 
3. c -> d 
4. c -> network drive 
network -> network 
Writing from docker to-from:
5. c -> c fine
6. d -> d fine (even if running matlab from c)
7. c -> d fails
8. c -> network drive fails
network -> network fine
zm711 commented 10 months ago

@h-mayorquin ,

happy to open a new issue, but I think we are discussing slightly different things. The function I'm testing is run_sorter not save.

When you get a moment could you run/try:


from spikeinterface.core.datasets import download_dataset
from spikeinterface.extractors.neoextractors.mearec import (
    MEArecRecordingExtractor
)
from spikeinterface.core import load_extractor
from spikeinterface.sorters import run_sorter

local_file_path = download_dataset(remote_path="mearec/mearec_test_10s.h5")
recording = MEArecRecordingExtractor(local_file_path)

file_path = "C:\\Users\\heberto\\spikeinterface_datasets\\ephy_testing_data\\mearec\\mearec_test_10s.h5"
recording = MEArecRecordingExtractor(file_path=file_path)

MEArecRecordingExtractor: 32 channels - 32.0kHz - 1 segments - 320,000 samples - 10.00s 
                          float32 dtype - 39.06 MiB
  file_path: [C:\Users\heberto\spikeinterface_datasets\ephy_testing_data\mearec\mearec_test_10s.h5](file:///C:/Users/heberto/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5)

folder_path = Path("D:\\folder_to_write_stuff")
folder_path.is_dir(), folder_path.anchor, folder_path.root
run_sorter(****, recording=recording, working_folder=folder_path) # Not sure which sorter you want to test locally
alejoe91 commented 10 months ago

Thanks guys for looking into this annoying issue!

I agree that run_sorter is more critical, because inside we have dumping to json with relative_to. I believe that is what's causing the issue

h-mayorquin commented 10 months ago

@zm711 @alejoe91

What I am saying is that the fact that the save functionality works implies that the problem is with the mounting volume mechanism of docker and NOT with the machinery to resolve relative paths.

That is, I know that the script you shared with me above fails. I am saying it does not fail for the reason that Sam described above which is the _make_paths_relative function. Otherwise, normal save would not work.

zm711 commented 10 months ago

@h-mayorquin I see your point. I misunderstood. I thought you were either saying it did work or that you were only testing save. But you were using save to probe the function. Cool.

Could you re-explain, if I run_sorter locally it is still failing though. I'm not doing any docker and c-> d fails. There is currently no docker on my system. So something is failing with the local mounting of the drives. So you think there are still two separate problems 1) windows local 2) docker mounting on windows? I just want all the info if I'm going to open new more specific issues.

h-mayorquin commented 10 months ago

Ah, that's good. I was meaning to test the function run_sorter without the usage of docker as my next step. Thanks for doing that.

I will take a look.

zm711 commented 10 months ago

I have meetings the rest of the morning (over here), but I will open new issues this afternoon. I see there being 3 for us to explore (maybe 2 and 3 are the same, but I will open separate issues and link them):

1) network drive mapping in general (as we discussed above) 2) local drive to local drive in run_sorter 3) local drive to docker in run_sorter

zm711 commented 9 months ago

This error has been fixed with the new dumping check mechanism. There is a new problem with pickling that can occur when just using the run_sorter, but run_sorter_by_property works fine. I'll link this issue to a new issue that needs to be worked on but this specific error should no longer occur.