GGiecold-zz / DBSCAN_multiplex

A fast and efficient implementation of DBSCAN clustering.
MIT License
52 stars 17 forks source link

DBSCAN on Windows with Anaconda Python - no permission to create temporary file #1

Closed MichaelJendryke closed 8 years ago

MichaelJendryke commented 8 years ago

I have the following code to run, but HDF5 tells me in the error back trace that i have no permission to create the temporary file.

The code I run is a script to run DBSCAN_multiplex.py:

import numpy as np
import DBSCAN_multiplex as DB
print("hey ;-)")
data = np.random.randn(15000, 7)
N_iterations = 50
N_sub = 9 * data.shape[0] / 10
subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = int)
for i in xrange(N_iterations): subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False)
eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)

The error message in PyCharm is:

C:\Anaconda2\python.exe E:/Dropbox/DATA/research/GhostTowns/WB/untitled/dbscan.py
hey ;-)
INFO: DBSCAN_multiplex @ load:
starting the determination of an appropriate value of 'eps' for this data-set and for the other parameter of the DBSCAN algorithm set to 3.
This might take a while.

INFO: DBSCAN_multiplex @ load:
done with evaluating parameter 'eps' from the data-set provided. This took 1.737 seconds. Value of epsilon: 0.921.

INFO: DBSCAN_multiplex @ load:
identifying the neighbors within an hypersphere of radius 0.921 around each sample, while at the same time evaluating the number of epsilon-neighbors for each sample.
This might take a fair amount of time.
Traceback (most recent call last):
  File "E:/Dropbox/DATA/research/GhostTowns/WB/untitled/dbscan.py", line 9, in <module>
    eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)
  File "C:\Anaconda2\lib\site-packages\DBSCAN_multiplex.py", line 656, in DBSCAN
    eps = load(f.name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
  File "C:\Anaconda2\lib\site-packages\DBSCAN_multiplex.py", line 354, in load
    fileh = tables.open_file(hdf5_file_name, mode = 'r+')
  File "C:\Anaconda2\lib\site-packages\tables\file.py", line 318, in open_file
    return File(filename, mode, title, root_uep, filters, **kwargs)
  File "C:\Anaconda2\lib\site-packages\tables\file.py", line 784, in __init__
    self._g_new(filename, mode, **params)
  File "tables\hdf5extension.pyx", line 488, in tables.hdf5extension.File._g_new (tables\hdf5extension.c:5458)
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5F.c", line 604, in H5Fopen
    unable to open file
  File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5Fint.c", line 990, in H5F_open
    unable to open file: time = Mon Dec 07 18:12:12 2015
, name = 'F:\temp\tmpthbz3d.h5', tent_flags = 1
  File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5FD.c", line 993, in H5FD_open
    open failed
  File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5FDsec2.c", line 343, in H5FD_sec2_open
    unable to open file: name = 'F:\temp\tmpthbz3d.h5', errno = 13, error message = 'Permission denied', flags = 1, o_flags = 2

End of HDF5 error back trace

Unable to open/create file 'F:\temp\tmpthbz3d.h5'

Process finished with exit code 1

What it boils down to is the last line which says that permissions are not set. However, I have allowed full read and write permissions for the folder and I tried to run python with administrator rights. I also guess, that the issue is somehow connected to HDF5, or HDF5 is trying to create this temporary file but fails. At this point I am clueless on how to solve this issue and I would be happy about any input.

This question might be connected to http://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file but since the mode is set to 'w' I had no luck with it.

Thanks

GGiecold-zz commented 8 years ago

Thank you for reporting this issue.

It seems that on Windows the name of a temporary file cannot be used to access that file a second time while the file is still open. In DBSCN_multiplex, I believe this explains the error you get: On line 652, the file is created and left open until the end of the block of context management starting at "with". But then, within the call to procedure 'load', an attempt is made on line 354 to read this file.

I do not have access to a machine running on Windows. I therefore intend to remove support of this module for this particular type of operating system. However, it should still be possible for you to run DBSCN_multiplex by creating any file and calling 'load' and 'shoot' with the name of this file as an argument.

Please let me know if this solves the problem you've encountered.

Kind regards,

Gregory

On Dec 7, 2015, at 5:30 AM, Michael Jendryke notifications@github.com wrote:

I have the following code to run, but HDF5 tells me in the error back trace that i have no permission to create the temporary file.

The code I run is a script to run DBSCAN_multiplex.py:

import numpy as np import DBSCAN_multiplex as DB print("hey ;-)") data = np.random.randn(15000, 7) N_iterations = 50 N_sub = 9 * data.shape[0] / 10 subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = int) for i in xrange(N_iterations): subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False) eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True) The error message in PyCharm is:

C:\Anaconda2\python.exe E:/Dropbox/DATA/research/GhostTowns/WB/untitled/dbscan.py hey ;-) INFO: DBSCAN_multiplex @ load: starting the determination of an appropriate value of 'eps' for this data-set and for the other parameter of the DBSCAN algorithm set to 3. This might take a while.

INFO: DBSCAN_multiplex @ load: done with evaluating parameter 'eps' from the data-set provided. This took 1.737 seconds. Value of epsilon: 0.921.

INFO: DBSCAN_multiplex @ load: identifying the neighbors within an hypersphere of radius 0.921 around each sample, while at the same time evaluating the number of epsilon-neighbors for each sample. This might take a fair amount of time. Traceback (most recent call last): File "E:/Dropbox/DATA/research/GhostTowns/WB/untitled/dbscan.py", line 9, in eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True) File "C:\Anaconda2\lib\site-packages\DBSCAN_multiplex.py", line 656, in DBSCAN eps = load(f.name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose) File "C:\Anaconda2\lib\site-packages\DBSCAN_multiplex.py", line 354, in load fileh = tables.open_file(hdf5_file_name, mode = 'r+') File "C:\Anaconda2\lib\site-packages\tables\file.py", line 318, in open_file return File(filename, mode, title, root_uep, filters, _kwargs) File "C:\Anaconda2\lib\site-packages\tables\file.py", line 784, in init self._g_new(filename, mode, _params) File "tables\hdf5extension.pyx", line 488, in tables.hdf5extension.File._g_new (tables\hdf5extension.c:5458) tables.exceptions.HDF5ExtError: HDF5 error back trace

File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5F.c", line 604, in H5Fopen unable to open file File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5Fint.c", line 990, in H5F_open unable to open file: time = Mon Dec 07 18:12:12 2015 , name = 'F:\temp\tmpthbz3d.h5', tent_flags = 1 File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5FD.c", line 993, in H5FD_open open failed File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5FDsec2.c", line 343, in H5FD_sec2_open unable to open file: name = 'F:\temp\tmpthbz3d.h5', errno = 13, error message = 'Permission denied', flags = 1, o_flags = 2

End of HDF5 error back trace

Unable to open/create file 'F:\temp\tmpthbz3d.h5'

Process finished with exit code 1 What it boils down to is the last line which says that permissions are not set. However, I have allowed full read and write permissions for the folder and I tried to run python with administrator rights. I also guess, that the issue is somehow connected to HDF5, or HDF5 is trying to create this temporary file but fails. At this point I am clueless on how to solve this issue and I would be happy about any input.

This question might be connected to http://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file but since the mode is set to 'w' I had no luck with it.

Thanks

— Reply to this email directly or view it on GitHub.

MichaelJendryke commented 8 years ago

Thanks for your answer. Unfortunately the machine that we want to use to cluster over millions points has windows 7 only. With your tips I got it to work, however it was not necessary to change load and shoot, I just modified the code section after NamedTemporaryFile to create the temporary file according the a fixed name.

I also saw that you use meminfo to get the free memory, I have given a permanent value there and do not call the memory() function anymore.

GGiecold-zz commented 8 years ago

Indeed, NamedTemporaryFile appears only in the 'DBSCAN' procedure. The 'memory' function is another reason why support has henceforth been removed for Windows platforms in 'DBSCAN_multiplex'.

Glad to learn however that everything else seems to be working as expected on Windows 7, with the proviso of applying the expedients we've just learned about thanks to you pointing out those issues.

All the best,

Gregory

On 12/8/15 1:03 AM, Michael Jendryke wrote:

Thanks for your answer. Unfortunately the machine that we want to use to cluster over millions points has windows 7 only. With your tips I got it to work, however it was not necessary to change load and shoot, I just modified the code section after NamedTemporaryFile to create the temporary file according the a fixed name.

I also saw that you use meminfo to get the free memory, I have given a permanent value there and do not call the memory() function anymore.


Reply to this email directly or view it on GitHub: https://github.com/GGiecold/DBSCAN_multiplex/issues/1#issuecomment-162778933

StatguyUser commented 6 years ago

@MichaelJendryke can you share what exactly you did in the code? I am in a similar situation, except that i am using windows10. Thanks

GGiecold-zz commented 6 years ago

In function DBSCAN, change

with NamedTemporaryFile('w', suffix='.h5', delete=True, dir='./')

to

with open(path.join(getcwd(), 'tmp.h5'), 'w') as f:

...

In addition, before the statement 'return eps, labels_matrix', insert the following:

os.remove(path.join(getcwd(), 'tmp.h5'))

You should also add the following line at the beginning of the modified source code:

from os import getcwd, path, remove

You will still have to install HDF5 support (check https://support.hdfgroup.org/HDF5/).

Hope this helps.

Gregory

On Sun, Jan 28, 2018 at 9:42 PM, newbiestatguy notifications@github.com wrote:

@MichaelJendryke https://github.com/michaeljendryke can you share what exactly you did? I am in a similar situation, except that i am using windows10. Thanks

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/GGiecold/DBSCAN_multiplex/issues/1#issuecomment-361125207, or mute the thread https://github.com/notifications/unsubscribe-auth/AK3j58e8JT7LJ5Q2XcXWrvBGqrP_fauyks5tPTAigaJpZM4GwBQy .