IBCNServices / pyRDF2Vec

🐍 Python Implementation and Extension of RDF2Vec
https://pyrdf2vec.readthedocs.io/en/latest/
MIT License
244 stars 49 forks source link

error when upgrade pyrdf2vec package #40

Closed nnadine25 closed 3 years ago

nnadine25 commented 3 years ago

hi, when i upgrade rdf2vec to version 2.0 when i work with this version i get the error Traceback (most recent call last):

import pyrdf2vec

File "C:\Users\ILINE\PycharmProjects\test2best\venv\lib\site-packages\pyrdf2vec__init__.py", line 1, in import nest_asyncio

ModuleNotFoundError: No module named 'nest_asyncio'

GillesVandewiele commented 3 years ago

Hi @nnadine25

How did you install the latest version? In case you are cloning from GitHub, do not forget to run poetry install or pip install .

If you are installing from pip (pip install pyrdf2vec), dependencies should be automatically installed. If not, we will fix this!

rememberYou commented 3 years ago

By scripts, here is the right way to use pyRDF2Vec using poetry as dependency manager:

  pip install poetry
  git clone https://github.com/IBCNServices/pyRDF2Vec.git
  cd pyRDF2Vec
  poetry install
  poetry shell

However, when you use Google Collab Notebook, the poetry shell command that allows you to open a shell in your virtual environment does not work as it seems to.

To bypass these errors on Google Collab, I recommend you to install the missing packages with pip:

!pip install aiohttp
!pip install nest-asyncio
!pip install rdflib
nnadine25 commented 3 years ago

i use pycharm ide and i install pyrdf2vec using pip install only

rememberYou commented 3 years ago

Good point, nest-asyncio is a package defined in the dev section instead of being global: https://github.com/IBCNServices/pyRDF2Vec/blob/592014bbcf3881b03cc8b807fc8ea1b6a56caeae/pyproject.toml#L68

I will patch this.

nnadine25 commented 3 years ago

i installed the trre modules !pip install aiohttp !pip install nest-asyncio !pip install rdflib and i still get theise error raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last): File "", line 1, in

rememberYou commented 3 years ago

@nnadine25 I created the 0.2.1 release to fix this issue with nest-asyncio: https://pypi.org/project/pyrdf2vec/

Please, try again with pyRDF2Vec 0.2.1:

pip install pyrdf2vec
nnadine25 commented 3 years ago

i tried this version 0.2.1 and i always get this message in the console raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last): File "", line 1, in

GillesVandewiele commented 3 years ago

That error seems to be related to using multiprocessing on Windows. Have you tried what the error is suggesting? I.e. putting your main script in this specific block:

if __name__ == '__main__':
  <PUT YOUR CODE HERE>
nnadine25 commented 3 years ago

i do what you the message say and pyrd2vec new version 0.2.1 work weel thank you

but i whould to know if the problem of random of word2vec is fixed i try to find the embedding of the ressource in this example , and the aray found in changed in each execusion i found diffrents values of array

import numpy as np from pyrdf2vec import RDF2VecTransformer from pyrdf2vec.embedders import Word2Vec from pyrdf2vec.graphs import KG from pyrdf2vec.samplers import UniformSampler from pyrdf2vec.walkers import RandomWalker import random import multiprocessing

Ensure the determinism of this script by initializing a pseudo-random number

generator.

np.random.seed(42) random.seed(42)

if name == 'main': transformer = RDF2VecTransformer(Word2Vec(workers=1, size=200), [RandomWalker(1, 200)]) embeddings = transformer.fit_transform( KG(location="http://dbpedia.org/sparql", is_remote=True), ["http://dbpedia.org/resource/Brussels"] ) print(embeddings)

GillesVandewiele commented 3 years ago

Ok great.

Yes, the problem of randomness has been completely fixed. You can ensure determinism by setting the seeds in the code as you have done:

np.random.seed(42)
random.seed(42)

But also by setting the PYTHONHASHSEED environment variable. I typically run my scripts from my command line PYTHONHASHSEED=42 python3 my_awesome_script.py

How to set this in your own environment is something you will have to Google yourself.

nnadine25 commented 3 years ago

thank you i aldo add in RandomWalker(2, None, random_state=42) and i get the same ambeddings in each execusion tank you