running main_Toy and get ValueError: Specified a sep and a delimiter; you can only specify one.

tianlt commented 2 years ago

Hi,

Thank you for presenting the VIA package.

I tried to run main_Toy via via.main_Toy(ncomps=10, knn=30,dataset='Toy4', random_seed=2,foldername = "/home/tialan/Downloads/VIA-master/Datasets/")

and get the following error; ValueError: Specified a sep and a delimiter; you can only specify one.

The whole error message is; Traceback (most recent call last): File "", line 1, in File "/home/tialan/viaenv/lib/python3.7/site-packages/pyVIA/core.py", line 3947, in main_Toy delimiter=",") File "/home/tialan/viaenv/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/home/tialan/viaenv/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 582, in read_csv defaults={"delimiter": ","}, File "/home/tialan/viaenv/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1303, in _refine_defaults_read raise ValueError("Specified a sep and a delimiter; you can only specify one.") ValueError: Specified a sep and a delimiter; you can only specify one.

Thank you

ShobiStassen commented 2 years ago

hi,

thanks for your note. It seems like you are having trouble reading in the file. Since the existing version works for me, can I suggest you first try to just remove the argument 'rt' (which is the argument for seperator) from lines 3947 and 3948 in core.py (located in your site-packages/pyVIA/core.py file) so that they just read:

   df_counts = pd.read_csv(foldername + "toy_disconnected_M9_n1000d1000.csv", delimiter=",")
   df_ids = pd.read_csv(foldername + "toy_disconnected_M9_n1000d1000_ids.csv", delimiter=",")

I think your version of pandas didnt like that both delimiter and separator were specified. I tried this and it works for me, but so does the original code. let me know if it works for you:)

Another example to help us debug this is the EB data. if you can first download the EB Embryo body data: (where the data file is named 'EBdata.mat' and the 2D embedding file is saved as 'EB_phate_embedding.csv'. Both these files are available from the readme information and Data folder for VIA's github page. Depending on your machine this could take a few minutes as it's a much larger dataset than Toy. The read_csv() function is called without specifying the delimiter / separator, so this should not incur the error you got above.

import matplotlib.pyplot as plt
import pyVIA.core as via
import pandas as pd
import umap
import scanpy as sc
import numpy as np
import warnings

 #### Test Embryoid Body
via.main_EB_clean(ncomps=30, knn=20, v0_random_seed=24, foldername='/home/shobi/Trajectory/Datasets/EB_Phate/')

tianlt commented 2 years ago

Hi, thank you for helping. After I removed 'rt' argument the error just disappeared.

SansMorel commented 2 years ago

Hi, I also had this same issue on windows.

Input:

#when running from an IDE you need to call the function in the following way to ensure the parallel processing works:
import os
import pyVIA.core as via
f= os.path.join(r'C:\Users\Sturla\Downloads\VIA-master\Datasets'+'\\')
def main():
    via.main_Toy(ncomps=10, knn=30,dataset='Toy3', random_seed=2,foldername= f)
if __name__ =='__main__':
    main()

Output:

C:\Users\Sturla\miniconda3\envs\ViaEnv\python.exe C:/Users/Sturla/Downloads/Via_test/toy.py
dataset, ncomps, knn, seed Toy3 10 30 2
C:/Users/Sturla/Downloads/Via_test/toy.py:6: FutureWarning: In a future version of pandas all arguments of read_csv except for the argument 'filepath_or_buffer' will be keyword-only
  via.main_Toy(ncomps=10, knn=30,dataset='Toy3', random_seed=2,foldername= f)
Traceback (most recent call last):
  File "C:/Users/Sturla/Downloads/Via_test/toy.py", line 8, in <module>
    main()
  File "C:/Users/Sturla/Downloads/Via_test/toy.py", line 6, in main
    via.main_Toy(ncomps=10, knn=30,dataset='Toy3', random_seed=2,foldername= f)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\site-packages\pyVIA\core.py", line 3938, in main_Toy
    delimiter=",")
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\site-packages\pandas\io\parsers\readers.py", line 582, in read_csv
    defaults={"delimiter": ","},
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\site-packages\pandas\io\parsers\readers.py", line 1303, in _refine_defaults_read
    raise ValueError("Specified a sep and a delimiter; you can only specify one.")
ValueError: Specified a sep and a delimiter; you can only specify one.

Process finished with exit code 1

Further, I tried running

import matplotlib.pyplot as plt
import pyVIA.core as via
import pandas as pd
import umap
import scanpy as sc
import numpy as np
import warnings

 #### Test Embryoid Body
via.main_EB_clean(ncomps=30, knn=20, v0_random_seed=24, foldername=r'C:\Users\Sturla\Downloads\VIA-master\Datasets'+"\\")

as you suggested and got:

C:\Users\Sturla\miniconda3\envs\ViaEnv\python.exe C:/Users/Sturla/Downloads/Via_test/via_test.py
ncomps, knn, n_var_genes, v0big, p1big, randomseed, time 30 20 no filtering for HVG 0.3 0.05 24 Fri Nov  5 09:52:10 2021
not pp scaled
do v0
input data has shape 16825 (samples) x 30 (features)
time is Fri Nov  5 09:54:20 2021
commencing global pruning
Share of edges kept after Global Pruning 48.66 %
number of components in the original full graph 1
for downstream visualization purposes we are also constructing a low knn-graph 
size neighbor array in low-KNN in pca-space for visualization (16825, 4)
commencing community detection
time is Fri Nov  5 09:54:24 2021
368  clusters before handling small/big
There are 0 clusters that are too big
EB: global cluster graph pruning level 0.15
number of components before pruning 1
percentage links trimmed from local pruning relative to start 0.0
percentage links trimmed from global pruning relative to start 73.4
there are  1 components in the graph
root user [1]
start computing lazy-teleporting Expected Hitting Times
ncomps, knn, n_var_genes, v0big, p1big, randomseed, time 30 20 no filtering for HVG 0.3 0.05 24 Fri Nov  5 09:54:31 2021
not pp scaled
do v0
input data has shape 16825 (samples) x 30 (features)
time is Fri Nov  5 09:56:43 2021
commencing global pruning
Share of edges kept after Global Pruning 48.66 %
number of components in the original full graph 1
for downstream visualization purposes we are also constructing a low knn-graph 
size neighbor array in low-KNN in pca-space for visualization (16825, 4)
commencing community detection
time is Fri Nov  5 09:56:47 2021
368  clusters before handling small/big
There are 0 clusters that are too big
EB: global cluster graph pruning level 0.15
number of components before pruning 1
percentage links trimmed from local pruning relative to start 0.0
percentage links trimmed from global pruning relative to start 73.4
there are  1 components in the graph
root user [1]
start computing lazy-teleporting Expected Hitting Times
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Sturla\Downloads\Via_test\via_test.py", line 10, in <module>
    via.main_EB_clean(ncomps=30, knn=20, v0_random_seed=24, foldername=r'C:\Users\Sturla\Downloads\VIA-master\Datasets'+"\\")
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\site-packages\pyVIA\core.py", line 4481, in main_EB_clean
    v0.run_VIA()
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\site-packages\pyVIA\core.py", line 3364, in run_VIA
    self.run_subPARC()
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\site-packages\pyVIA\core.py", line 2751, in run_subPARC
    new_root_index)  # +adjacency_matrix.T))
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\site-packages\pyVIA\core.py", line 1492, in simulate_markov
    manager = multiprocessing.Manager()
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\context.py", line 56, in Manager
    m.start()
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\managers.py", line 563, in start
    self._process.start()
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\Sturla\miniconda3\envs\ViaEnv\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I then tried rerunning Toy after removing 'rt' and Toy now works, (but still not Embryoid Body).

SansMorel commented 2 years ago

I just noticed that I didn't add the __name__ stuff. It works after adding it like this:

import matplotlib.pyplot as plt
import pyVIA.core as via
import pandas as pd
import umap
import scanpy as sc
import numpy as np
import warnings

 #### Test Embryoid Body
def main():
    via.main_EB_clean(ncomps=30, knn=20, v0_random_seed=24,
                      foldername=r'C:\Users\Sturla\Downloads\VIA-master\Datasets' + "\\")
if __name__ =='__main__':
    main()

ShobiStassen / VIA

running main_Toy and get ValueError: Specified a sep and a delimiter; you can only specify one. #6