eXascaleInfolab / PyCABeM

Python Benchmarking Framework for the Clustering Algorithms Evaluation: networks generation and shuffling; failover execution and resource consumption tracing (peak RAM RSS, CPU, ...); evaluation of Modularity, conductance, NMI and F1 Score for overlapping communities
Other
19 stars 4 forks source link

Python 3 support? #13

Closed Make42 closed 3 years ago

Make42 commented 3 years ago

Is Python 3 supported and - if not - what are the plans to support it? After all Python 2 is deprecated.

Also: To make the usage of your software more accessible, could you put it on pypi and make sure that people are able to install it by simply running pip?

luav commented 3 years ago

Hi @Make42 , the benchmark should work both using Python2, Python3, and pypy.

I have not worked with this project for the last 2 years.
Thank you for the proposal. I have not packaged Clubmark because not so many people benchmark clustering algorithms. The pip package exists for PyExPool (utils/mpepool.py), which is a general purpose scheduler that is used in this benchmarking framework.

Make42 commented 3 years ago

I have basically written my own framework by now - but it is nothing general enough to be published (it is actually super specific for my research). However, I need measures that can compare hard "true" clustering labels with fuzzy clustering results. Fuzzy means that each data object gets assigned to a cluster with a certain percentage. The number of clusters can differ from the number of different "true" labels. Of course, the numbers of the "true" labels are not the in any way related to the numbers of the clustering results.

I thought, I could use https://github.com/eXascaleInfolab/xmeasures, but I did not manage to get it to run. Also, I am not not even sure that it covers my use-case as described.

What tools can I use for the use-case as I described? Is one of your projects suitable?

I think that people might be interested - at least I am - but your tools seem not very accessible. What I mean by that: I am sure they are well build and very advanced (that is my impression), but I do not understand how to use them and I do not want to spend days to find out whether and which tools are right for me.

luav commented 3 years ago

I could use https://github.com/eXascaleInfolab/xmeasures, but I did not manage to get it to run. @Make42 , you are welcome to open the respective issue for xmeasures if you found some problems there. However, that app has been used successfully by several people, so it definitely can be run.
Xmeasures assigns each object equally to all clusters sharing that object. If you need a weighted assignment, then you are welcome to contribute to that tool ;-)
You can use weighed overlapping measures, whose unweighted flavors are implemented in xmesures (Mean F1 family, Omega Index, NMI for overlapping clusters). There are exist many other measures for the accuracy evaluation of Fuzzy clustering, which I did not consider because of the huge computational complexity (more than square) and which you can find in research papers.

I do not understand how to use them and I do not want to spend days to find out whether and which tools are right for me. I did not devote much time to make those tools super usable because the audience is relatively small and tech-savvy. However, all required information to run those tools is specified in their README files. You are welcome to open issues when you encounter problems.

luav commented 3 years ago

The Clumbark framework seems to run fine on Python3. Otherwise, the stack trace is expected together with other information to reproduce the issue.