EvgeniDubov / hellinger-distance-criterion

Random Forest model using Hellinger Distance as split criterion
BSD 3-Clause "New" or "Revised" License
31 stars 12 forks source link

Error with Cythonzing #9

Closed ndrmahmoudi closed 3 years ago

ndrmahmoudi commented 3 years ago

Hi Evgeni,

I have followed the instructions you have provided, downloaded the library, and run the setup.py. However, I do get this error



Error compiling Cython file:
------------------------------------------------------------
...

        self.children_impurity(&impurity_left, &impurity_right)

        return impurity_right + impurity_left

    cdef double impurity_improvement(self, double impurity) nogil:
        ^
------------------------------------------------------------

hellinger_distance_criterion.pyx:26:9: Signature not compatible with previous declaration

Error compiling Cython file:
------------------------------------------------------------
...
    cdef int update(self, SIZE_t new_pos) nogil except -1
    cdef double node_impurity(self) nogil
    cdef void children_impurity(self, double* impurity_left,
                                double* impurity_right) nogil
    cdef void node_value(self, double* dest) nogil
    cdef double impurity_improvement(self, double impurity_parent,
                                   ^
------------------------------------------------------------

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\tree\_criterion.pxd:65:36: Previous declaration is here
Traceback (most recent call last):
  File "setup.py", line 11, in <module>
    ext_modules=cythonize('hellinger_distance_criterion.pyx', language_level = "3"),
  File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Build\Dependencies.py", line 1102, in cythonize
    cythonize_one(*args)
  File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Build\Dependencies.py", line 1225, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: hellinger_distance_criterion.pyx`````

Could you please let me know how to fix this issue? Also, is there any update about incorporating this split method into Sklearn? 

Many thanks in advance for your response.

Regards,
Nader
harish1996 commented 3 years ago

Hi,

I can see that this particular function's declaration has been changed in https://github.com/scikit-learn/scikit-learn/commit/22f232efc608e3d90b833c72c9fd3da1b4eacd97

Looks like it is part of the 0.24.1 version. If it is okay, you can try downgrading your sklearn to before 0.24.1. I have used it once, and it worked once upon a time.

Also, we can already use this criterion in sklearn, it is just not defined in the sklearn repository itself. sklearn split criterion, afaik supports external criterion.

Or even better, you can edit the code for the hellinger's criterion here to directly obtain the left and right impurity, which is obtained by a seperate call in this implementation. That particular commit i posted, exactly addresses this problem.

EvgeniDubov commented 3 years ago

I'm working on a hellinger criterion PR #16478 in sklearn @ndrmahmoudi sorry for the delayed response @harish1996 thank a lot for answering and contributing