Open naja-sirina opened 2 years ago
Hi,
maybe is version 3.10 of python.
Can you test 3.6 or 3.8? These are the versions in which we tested it.
wrt python-Levenshtein
One way to get around this issue is to install the python3-devel
package
Hello,
I test with versions 3.8, but there is the same error. And if I understand correctly, python3-devel package is only for Linux system? I have Windows.
I try: pip install python-dev-tools for my systems, but error with Levenshtein is still there
Hi,
this is a very good issue. In my team we don't have Windows machines and it is really hard to check how to fix.
Regardless, the python3-devel
is needed to install python-Levenshtein==0.12.2
.
So, in a way, if you manage to install python-Levenshtein==0.12.2
via another route, you should be able to shortcut the problem and make it work.
I have quickly checked on google:
keep me posted on how it goes, and I would be grateful if you could share the solution.
Thanks a lot.
ok, I'll write if I can install it.
Installation completed successfully. There was a problem with the C compiler, which caused problems with the levenstein an igraph libraries. Now it's ok.
But I have another problem. When I try to test it on this step:
_import cso_classifier as test test.test_classifier_single_paper() # to test it with one paper test.test_classifier_batchmode() # to test it with multiple papers
I have another error with test: ImportError: cannot import name 'Mapping' from 'collections'
Interesting. This is very odd because I don't use any 'Mapping' in the CSO Classifier.
So it must be a third-party package which is not properly tested.
I have googled and looks like the issue appear in python 3.10 and suggest to go to 3.8 (https://stackoverflow.com/questions/69381312/in-vs-code-importerror-cannot-import-name-mapping-from-collections). Which version of python are you currently using?
I install Python 3.8. and it's new error now. First i see text; "De-anonymizing Social Networks Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization ..." and etc.
Then: "Ontology pickle file is missing. Extracting and converting ontology."
And after that it's error:
PermissionError: [Errno 13] Permission denied: 'C:\Program Files\Python38\Lib\site-packages\cso_classifier\assets/cso.p'
And I use Pycharm studio for working.
And if I try Python 3.7, I have an installation error.
Hi Just to clarify,
after installing the CSO Classifier, did you run the setup:
from cso_classifier import CSOClassifier as cc
cc.setup()
exit() # it is important to close the current console, to make those changes effective
Yes, and I have the same error after this (for python 3.8):
====================================================== ONTOLOGY ======================================================_
PermissionError: [Errno 13] Permission denied: 'C:\Program Files\Python38\Lib\site-packages\cso_classifier\assets/cso.p'
I have an idea, which needs to be tested. I am not 100% sure, but it can be an issue of slashes. Linux-based systems use forward slashes to separate directories. Windows uses backslashes.
I have amended the config.ini file so that files have the backslash.
Can I ask you to edit the C:\Program Files\Python38\Lib\site-packages\cso_classifier\config.ini
with the following text:
[classifier]
classifier_version = 3.0
package_name = cso-classifier
[ontology]
cso_path = assets\cso.csv
cso_pickle_path = assets\cso.p
cso_graph_path = assets\cso_graph.p
cso_remote_url = https://cso.kmi.open.ac.uk/download
cso_versions_logger_url = http://cso.kmi.open.ac.uk/versioning/versions.json
cso_version = 0.0
[model]
model_pickle_path = assets\model.p
model_pickle_remote_url = https://cso.kmi.open.ac.uk/download/model.p
cached_model = assets\token-to-cso-combined.json
cached_model_remote_url = https://cso.kmi.open.ac.uk/download/token-to-cso-combined.json
Eventually, this will work. Keep me posted.
best
I try this variant, but now:
PermissionError: [Errno 13] Permission denied: 'C:\Program Files\Python38\Lib\site-packages\cso_classifier\assets\cso.p'
And there is some problem with json file, i cant download it: https://cso.kmi.open.ac.uk/download/token-to-cso-combined.json
this is very odd. I am thinking that perhaps you don't have permission to write that folder.
My suggestions are:
The first solution is a bit tougher but better in the long run.
Now it works for a function:
test.test_classifier_single_paper()
But when I try to test it with multiple papers by function: test.test_classifier_batch_mode(), I have another error:
RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
raise RuntimeError('''
Yes. According to https://stackoverflow.com/questions/63871662/python-multiprocessing-freeze-support-error, the freeze support is needed for windows machines. This is a bit hard for me to check.
Can you create a new python script with the following code and test it for me?
from multiprocessing import Process, freeze_support
import json
from cso_classifier import CSOClassifier
def my_test_classifier_batch_mode():
""" Functionality that tests the classifier in batch mode.
It loads two papers, it calls the classifier with certain parameters and then it prints the results over the console.
"""
papers = dict()
papers['paper1'] = {
"title": "De-anonymizing Social Networks",
"abstract": "Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate. Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy \"sybil\" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.",
"keywords": "data mining, data privacy, graph theory, social networking (online)"
}
papers['paper2'] = {
"title": "Automatic Classification of Springer Nature Proceedings with Smart Topic Miner",
"abstract": "The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature Classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.",
"keywords": "Scholarly data, Ontology learning, Bibliographic data, Scholarly ontologies, Data mining, Conference proceedings Metadata"
}
for key, paper in papers.items():
print(key)
print(paper["title"])
print(paper["abstract"])
print(paper["keywords"])
cso_classifier = CSOClassifier()
results = cso_classifier.batch_run(papers, workers = 2)
print(results)
with open('output.json', 'w') as outfile:
json.dump(results, outfile, indent=4)
if __name__ == '__main__':
freeze_support() # needed for Windows
my_test_classifier_batch_mode()
This code is a kind of replica of the batch mode testing but it contains the freeze_support()
function.
Yes, it's work! Thanks
This is supercool!!! Thanks a lot. Happy classifying!
Hello!
I try to install your classifier ang get error: Failed building wheel for python-igraph Stored in directory: c:\users\guest\appdata\local\pip\cache\wheels\7d\1d\2c\a4989f424c14d3f3bb5ab05a470275cf1d8f69857d81249b22 Building wheel for python-igraph (setup.py) ... error error: subprocess-exited-with-error
And thereafter
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for python-Levenshtein Running setup.py clean for python-Levenshtein Building wheel for spacy (pyproject.toml) ... error error: subprocess-exited-with-error
Can you help me, please, with my error.
Windows 11, python 3.10.6