Closed Mec-iS closed 2 years ago
That's a helpful feature.
It's specific to scikit-network
and should be denoted as that in the method name.
Two concerns:
scikit-network
datasets ?)PathLike
, to allow for working consistently with non-Posix systems, such as cloud storage bucketsInstead I would use a pattern such as:
kg = KnowledgeGraph()
path = pathlib.Path("wikings-families")
kg.load_scikit_dataset(path)
BTW, this reminded me that the cloudpathlib
library which our team uses elsewhere has become more general than the urlpath
library which we used here in kglab
, and we'll need to make that update throughout the serialization methods.
It's specific to scikit-network and should be denoted as that in the method name.
No, it is a common pattern used by all the popular libraries, also pytorch
and tensorflow
provides it for example
The idea is just to encapsulate all this logic:
from os.path import dirname
import kglab
import os
namespaces = {
"foaf": "http://xmlns.com/foaf/0.1/",
"gorm": "http://example.org/sagas#",
"rel": "http://purl.org/vocab/relationship/",
}
kg = kglab.KnowledgeGraph(
name = "Happy Vikings KG example for SKOS/OWL inference",
namespaces=namespaces,
)
kg.load_rdf(dirname(dirname(os.getcwd())) + "/dat/gorm.ttl")
into a method, so that the user can avoid knowing all these details.
Accepting your notes that could be:
kg = KnowledgeGraph()
load_dataset("wikings-families", kg=kg, path=None, title=None, namespaces=None)
So that parameters can be passed if needed.
Users will still be able to use kg.load_*
explicitly if they need. The new one is just a convenience method for newcomers to quickly load one of the default dataset for experimentation.
Thank you @Mec-iS , that helps me much understand better.
I see about the convenience method, although arguably this is a practice that create extra cognitive load, with PyTorch being an example cited.
For files used in our tutorials we want to emphasize examples of how to load or save files in storage, ideally as Posix files. The thinking is: this way there are less differences to overcome when people try to apply code from our examples for their own projects.
One problem we've encountered during Q&A is that there are namespaces which are difficult to understand, such as the RDF prefix namespace. Moving between different libraries (e.g., RDF vs. NetworkX) also introduces API namespaces to navigate. 'm apprehensive about adding a dataset namespace, since these are only for tutorial example sand not part of the library usage in production.
FWIW, I found this exchange between the fsspec
and cloudpathlib
communities entertaining :) https://github.com/drivendataorg/cloudpathlib/issues/96
I'm submitting a
Current Behaviour:
It is hard to load any of the default datasets.
Expected Behaviour:
there should be a straighforward way of loading existing datasets, for example:
Every dataset should have a name that if passed to
load_dataset
provides automatic imports of the dataset in a given graph; as for example provided byscikit-network
load collection