angelosalatino / cso-classifier

Python library that classifies content from scientific papers with the topics of the Computer Science Ontology (CSO).
https://cso.kmi.open.ac.uk
Apache License 2.0
85 stars 18 forks source link

installation error #12

Open naja-sirina opened 2 years ago

naja-sirina commented 2 years ago

Hello!

I try to install your classifier ang get error: Failed building wheel for python-igraph Stored in directory: c:\users\guest\appdata\local\pip\cache\wheels\7d\1d\2c\a4989f424c14d3f3bb5ab05a470275cf1d8f69857d81249b22 Building wheel for python-igraph (setup.py) ... error error: subprocess-exited-with-error

And thereafter

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for python-Levenshtein Running setup.py clean for python-Levenshtein Building wheel for spacy (pyproject.toml) ... error error: subprocess-exited-with-error

Can you help me, please, with my error.

Windows 11, python 3.10.6

angelosalatino commented 2 years ago

Hi,

maybe is version 3.10 of python.

Can you test 3.6 or 3.8? These are the versions in which we tested it.

angelosalatino commented 2 years ago

wrt python-Levenshtein

One way to get around this issue is to install the python3-devel package

naja-sirina commented 2 years ago

Hello,

I test with versions 3.8, but there is the same error. And if I understand correctly, python3-devel package is only for Linux system? I have Windows.

I try: pip install python-dev-tools for my systems, but error with Levenshtein is still there

angelosalatino commented 2 years ago

Hi,

this is a very good issue. In my team we don't have Windows machines and it is really hard to check how to fix.

Regardless, the python3-devel is needed to install python-Levenshtein==0.12.2.

So, in a way, if you manage to install python-Levenshtein==0.12.2 via another route, you should be able to shortcut the problem and make it work.

I have quickly checked on google:

keep me posted on how it goes, and I would be grateful if you could share the solution.

Thanks a lot.

naja-sirina commented 2 years ago

ok, I'll write if I can install it.

naja-sirina commented 2 years ago

Installation completed successfully. There was a problem with the C compiler, which caused problems with the levenstein an igraph libraries. Now it's ok.

But I have another problem. When I try to test it on this step:

_import cso_classifier as test test.test_classifier_single_paper() # to test it with one paper test.test_classifier_batchmode() # to test it with multiple papers

I have another error with test: ImportError: cannot import name 'Mapping' from 'collections'

angelosalatino commented 2 years ago

Interesting. This is very odd because I don't use any 'Mapping' in the CSO Classifier.

So it must be a third-party package which is not properly tested.

I have googled and looks like the issue appear in python 3.10 and suggest to go to 3.8 (https://stackoverflow.com/questions/69381312/in-vs-code-importerror-cannot-import-name-mapping-from-collections). Which version of python are you currently using?

naja-sirina commented 2 years ago

I install Python 3.8. and it's new error now. First i see text; "De-anonymizing Social Networks Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization ..." and etc.

Then: "Ontology pickle file is missing. Extracting and converting ontology."

And after that it's error:

PermissionError: [Errno 13] Permission denied: 'C:\Program Files\Python38\Lib\site-packages\cso_classifier\assets/cso.p'

And I use Pycharm studio for working.

naja-sirina commented 2 years ago

And if I try Python 3.7, I have an installation error.

angelosalatino commented 2 years ago

Hi Just to clarify,

after installing the CSO Classifier, did you run the setup:

from cso_classifier import CSOClassifier as cc
cc.setup()
exit() # it is important to close the current console, to make those changes effective
naja-sirina commented 2 years ago

Yes, and I have the same error after this (for python 3.8):

====================================================== ONTOLOGY ======================================================_

PermissionError: [Errno 13] Permission denied: 'C:\Program Files\Python38\Lib\site-packages\cso_classifier\assets/cso.p'

angelosalatino commented 2 years ago

I have an idea, which needs to be tested. I am not 100% sure, but it can be an issue of slashes. Linux-based systems use forward slashes to separate directories. Windows uses backslashes.

I have amended the config.ini file so that files have the backslash.

Can I ask you to edit the C:\Program Files\Python38\Lib\site-packages\cso_classifier\config.ini with the following text:

[classifier]
classifier_version = 3.0
package_name = cso-classifier

[ontology]
cso_path = assets\cso.csv
cso_pickle_path = assets\cso.p
cso_graph_path = assets\cso_graph.p
cso_remote_url = https://cso.kmi.open.ac.uk/download
cso_versions_logger_url = http://cso.kmi.open.ac.uk/versioning/versions.json
cso_version = 0.0

[model]
model_pickle_path = assets\model.p
model_pickle_remote_url = https://cso.kmi.open.ac.uk/download/model.p
cached_model = assets\token-to-cso-combined.json
cached_model_remote_url = https://cso.kmi.open.ac.uk/download/token-to-cso-combined.json

Eventually, this will work. Keep me posted.

best

naja-sirina commented 2 years ago

I try this variant, but now:

PermissionError: [Errno 13] Permission denied: 'C:\Program Files\Python38\Lib\site-packages\cso_classifier\assets\cso.p'

And there is some problem with json file, i cant download it: https://cso.kmi.open.ac.uk/download/token-to-cso-combined.json

angelosalatino commented 2 years ago

this is very odd. I am thinking that perhaps you don't have permission to write that folder.

My suggestions are:

The first solution is a bit tougher but better in the long run.

naja-sirina commented 2 years ago

Now it works for a function:

test.test_classifier_single_paper()

But when I try to test it with multiple papers by function: test.test_classifier_batch_mode(), I have another error:

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
raise RuntimeError('''
angelosalatino commented 2 years ago

Yes. According to https://stackoverflow.com/questions/63871662/python-multiprocessing-freeze-support-error, the freeze support is needed for windows machines. This is a bit hard for me to check.

Can you create a new python script with the following code and test it for me?

from multiprocessing import Process, freeze_support
import json
from cso_classifier import CSOClassifier

def my_test_classifier_batch_mode():
    """ Functionality that tests the classifier in batch mode.
    It loads two papers, it calls the classifier with certain parameters and then it prints the results over the console.
    """

    papers = dict()
    papers['paper1'] = {
        "title": "De-anonymizing Social Networks",
        "abstract": "Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate. Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy \"sybil\" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.",
        "keywords": "data mining, data privacy, graph theory, social networking (online)"
        }
    papers['paper2'] = {
        "title": "Automatic Classification of Springer Nature Proceedings with Smart Topic Miner",
        "abstract": "The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature Classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.",
        "keywords": "Scholarly data, Ontology learning, Bibliographic data, Scholarly ontologies, Data mining, Conference proceedings Metadata"
        }

    for key, paper in papers.items():
        print(key)
        print(paper["title"])
        print(paper["abstract"])
        print(paper["keywords"])

    cso_classifier = CSOClassifier()
    results = cso_classifier.batch_run(papers, workers = 2)

    print(results)
    with open('output.json', 'w') as outfile:
        json.dump(results, outfile, indent=4)

if __name__ == '__main__':
    freeze_support()  # needed for Windows
    my_test_classifier_batch_mode()

This code is a kind of replica of the batch mode testing but it contains the freeze_support() function.

naja-sirina commented 2 years ago

Yes, it's work! Thanks

angelosalatino commented 2 years ago

This is supercool!!! Thanks a lot. Happy classifying!