Closed fils closed 2 years ago
From the error message I can see you are running on GPUs, you got the same error running on plain CPUs? It looks like is complaining for the shape of the dataframe, I will look into it.
I have collected all your code from issue #200 and this one into this Colab notebook or in this Github gist
If you have a Google account you can open and run it remotely (on a free tier). The code works there and you can check if the results returned are the ones expected by copying and modifying the notebook, let me know how it goes!
Please provide your local system's specs (OS, you are using a virtual environment, how did you install your packages) so I can understand the problem you have in running on your machine. The suggested way to have control on the packages installed is to use a Python Virtual Environment. Also I find really useful to use Jupyter Lab desktop application to manage package installation.
@Mec-iS Thanks for your help on these two items.
The SPARQL does work in Colab so I suspect this is an issue with my GPU usage as you point out.
Is there a simple way to turn off GPU leveraging in a notebook? I honestly don't know how I would disable that for a specific case.
sure. no problem.
The instantiation of the graph has an option to disable GPUs:
kg = kglab.KnowledgeGraph(
name = "hydroshare",
namespaces = ns,
use_gpus=False
)
@Mec-iS Thanks!
Turning off the GPU removed the error. Thanks much (now I need to resolve my GPU issue, but that can wait for another time).
I'm not well disciplined in my python usage, I am using conda to set up my environment and you can find it here if you are interested further in this issues (https://github.com/gleanerio/notebooks/blob/master/environment.yml) but it's a bit large (see reference about lack of discipline )
You've resolved this problem though so feel free to close. I'll post new problems in new issues.. ;)
Hi @fils ,
Which is the dataset that you're using? It's not on https://github.com/DerwenAI/kglab/issues/24 -- but another?
I can try to recreate the issue on my Linux laptop which has an NVIDIA GPU.
It may be that some underlying dependencies for RAPIDS have changed. They have a somewhat non-standard "release selector" which we haven't updated in several months https://rapids.ai/start.html
the dataset is the one in #200, can be downloaded from s3.
@ceteri @Mec-iS
I pushed up some of what I am working on (including the graphs) to https://github.com/gleanerio/notebooks/tree/master/Hydroshare
As noted, this should be the same graph as at the S3 (updated with prefix for schema.org).
I think the issue just may be the graph and the way I am approaching it not being the best. So I think things are working fine (sans the GPU issue I have.. which could be my install.. driver 495.29.05 by the way on a GTX 1050 Ti, nothing too special).
I'm trying to work up some ways people can inspect their schema.org based graphs around their datasets coming them implementing https://github.com/ESIPFed/science-on-schema.org/ guidance. So any course corrections or guidance would be more than welcome!
Thanks for your engagement with this..
So @ceteri, I think that you are correct on the RAPIDS release selector. We have RAPIDS installed on a development node of our gpu cluster using the following selector conda create -n rapids-21.12 -c rapidsai -c nvidia -c conda-forge \ cudf=21.12 cuml=21.12 cugraph=21.12 python=3.8 cudatoolkit=11.2
Running the example from the tutorial:
import kglab
namespaces = {
"wtm": "http://purl.org/heals/food/",
"ind": "http://purl.org/heals/ingredient/",
"skos": "http://www.w3.org/2004/02/skos/core#",
}
kg = kglab.KnowledgeGraph(
name = "A recipe KG example based on Food.com",
base_uri = "https://www.food.com/recipe/",
namespaces = namespaces,
)
produces a similar error message to what @fils was seeing.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_2396895/1517367763.py in <module>
----> 1 kg.describe_ns()
/opt/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/kglab/kglab.py in describe_ns(self)
254
255 if self.use_gpus:
--> 256 df = cudf.DataFrame(rows_list, columns=col_names)
257 else:
258 df = pd.DataFrame(rows_list, columns=col_names)
/opt/anaconda3/envs/rapids-21.12/lib/python3.8/contextlib.py in inner(*args, **kwds)
73 def inner(*args, **kwds):
74 with self._recreate_cm():
---> 75 return func(*args, **kwds)
76 return inner
77
/opt/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/cudf/core/dataframe.py in __init__(self, data, index, columns, dtype)
610 )
611 else:
--> 612 self._init_from_list_like(
613 data, index=index, columns=columns
614 )
/opt/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/cudf/core/dataframe.py in _init_from_list_like(self, data, index, columns)
750 if columns is not None:
751 if len(columns) != len(data):
--> 752 raise ValueError(
753 f"Shape of passed values is ({len(index)}, {len(data)}), "
754 f"indices imply ({len(index)}, {len(columns)})."
ValueError: Shape of passed values is (31, 31), indices imply (31, 2).
The machine details are:
Wed Feb 16 14:53:35 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94 Driver Version: 470.94 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 Off | 00000000:00:09.0 Off | 0 |
| N/A 18C P8 13W / 250W | 3MiB / 22698MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 6000 Off | 00000000:00:0A.0 Off | 0 |
| N/A 21C P8 13W / 250W | 3MiB / 22698MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Quadro RTX 6000 Off | 00000000:00:0B.0 Off | 0 |
| N/A 21C P8 13W / 250W | 3MiB / 22698MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Quadro RTX 6000 Off | 00000000:00:0C.0 Off | 0 |
| N/A 20C P8 12W / 250W | 3MiB / 22698MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The node is running Red Hat Enterprise Linux release 8.5 (Ootpa), Python 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
@charlesvardeman please move your comment to the RAPIDS related discussion or open a new issue.
I close this as been resolved.
@charlesvardeman @Mec-iS @fils:
I've opened another issue #229 specifically to track the updates we need to do for supporting RAPIDS
@ceteri Paco, So I have some time to spend working with some schema.org based data from Hydroshare and exploring using kglabs to explore it. I'm having issues applying it and I hope in resolving them I might be able to help somehow in the docs and such.
Hopefully this isn't just me being stupid in graph space, but is of some help back to the project. Happy to share.
Working with the same data from Issue 24 and now trying the NetworkX area I got a specific error.
So this code:
results in this error
The results of that SPARQL on the graph should be like: