Closed mih closed 2 years ago
I tried the code above as a freshly registered user after exporting the token I got from datalad ebrains-authenticate
. It took a while and threw some warning, and eventually ended in an error, but I feel like this error is on the server side.
In [4]: list(fq.bootstrap('a8932c7e-063c-4131-ab96-996d843998e9', Path('/tmp/kgdstryout')))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:256: UserWarning: 'str' object has no attribute 'get'
warnings.warn(str(err))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'full_documentation' is required but was not provided.
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:256: UserWarning: data must be a list or dict
warnings.warn(str(err))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'value' should be of type (<class 'float'>,), not <class 'int'>
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:256: UserWarning: data must be a list or dict
warnings.warn(str(err))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'full_documentation' is required but was not provided.
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'value' should be of type (<class 'float'>,), not <class 'int'>
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:256: UserWarning: data must be a list or dict
warnings.warn(str(err))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'full_documentation' is required but was not provided.
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'value' should be of type (<class 'float'>,), not <class 'int'>
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:256: UserWarning: data must be a list or dict
warnings.warn(str(err))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'full_documentation' is required but was not provided.
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'value' should be of type (<class 'float'>,), not <class 'int'>
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:256: UserWarning: data must be a list or dict
warnings.warn(str(err))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'full_documentation' is required but was not provided.
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'value' should be of type (<class 'float'>,), not <class 'int'>
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:256: UserWarning: data must be a list or dict
warnings.warn(str(err))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'full_documentation' is required but was not provided.
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'value' should be of type (<class 'float'>,), not <class 'int'>
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:256: UserWarning: 'str' object has no attribute 'get'
warnings.warn(str(err))
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'full_documentation' is required but was not provided.
warnings.warn(errmsg)
/home/adina/env/ebrains/lib/python3.10/site-packages/fairgraph/fields.py:96: UserWarning: Field 'value' should be of type (<class 'float'>,), not <class 'int'>
warnings.warn(errmsg)
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[4], line 1
----> 1 list(fq.bootstrap('a8932c7e-063c-4131-ab96-996d843998e9', Path('/tmp/kgdstryout')))
File ~/repos/datalad-ebrains/datalad_ebrains/fairgraph_query.py:35, in FairGraphQuery.bootstrap(self, from_id, path)
32 # TODO support a starting version for the import
33 # TODO maybe derive automatically from a tag?
34 for kg_dsver in kg_dsversions:
---> 35 yield from self.import_datasetversion(
36 ds, kg_dsver.resolve(self.client))
File ~/repos/datalad-ebrains/datalad_ebrains/fairgraph_query.py:79, in FairGraphQuery.import_datasetversion(self, ds, kg_dsver)
77 def import_datasetversion(self, ds, kg_dsver):
78 self.clean_ds_worktree(ds)
---> 79 yield from self.import_files(ds, kg_dsver)
80 self.import_metadata(ds, kg_dsver)
81 yield from self.save_ds_version(ds, kg_dsver)
File ~/repos/datalad-ebrains/datalad_ebrains/fairgraph_query.py:103, in FairGraphQuery.import_files(self, ds, kg_dsver)
102 def import_files(self, ds, kg_dsver):
--> 103 yield from ds.addurls(
104 # Turn query into an iterable of dicts for addurls
105 urlfile=self.get_file_records(ds, kg_dsver),
106 urlformat='{url}',
107 filenameformat='{name}',
108 # construct annex key from EBRAINS supplied info
109 #key='et:MD5-s{size}--{md5sum}',
110 # we will have a better idea than "auto"
111 exclude_autometa='*',
112 # and here it would be
113 #meta=(
114 # 'ebrains_last_modified={last_modified}',
115 # 'ebrain_last_modification_userid={last_modifier}',
116 #),
117 fast=True,
118 save=False,
119 result_renderer='disabled',
120 return_type='generator',
121 )
File ~/repos/datalad/datalad/interface/base.py:873, in _execute_command_(interface, cmd, cmd_args, cmd_kwargs, exec_kwargs)
867 pass_summary = do_custom_result_summary \
868 and getattr(interface,
869 'custom_result_summary_renderer_pass_summary',
870 None)
872 # process main results
--> 873 for r in _process_results(
874 # execution
875 cmd(*cmd_args, **cmd_kwargs),
876 interface,
877 allkwargs['on_failure'],
878 # bookkeeping
879 action_summary,
880 incomplete_results,
881 # communication
882 result_renderer,
883 result_log_level,
884 # let renderers get to see how a command was called
885 allkwargs):
886 for hook, spec in hooks.items():
887 # run the hooks before we yield the result
888 # this ensures that they are executed before
889 # a potentially wrapper command gets to act
890 # on them
891 if match_jsonhook2result(hook, r, spec['match']):
File ~/repos/datalad/datalad/interface/utils.py:319, in _process_results(results, cmd_class, on_failure, action_summary, incomplete_results, result_renderer, result_log_level, allkwargs)
312 # how many repetitions to show, before suppression kicks in
313 render_n_repetitions = \
314 dlcfg.obtain('datalad.ui.suppress-similar-results-threshold') \
315 if sys.stdout.isatty() \
316 and dlcfg.obtain('datalad.ui.suppress-similar-results') \
317 else float("inf")
--> 319 for res in results:
320 if not res or 'action' not in res:
321 # XXX Yarik has to no clue on how to track the origin of the
322 # record to figure out WTF, so he just skips it
323 # but MIH thinks leaving a trace of that would be good
324 lgr.debug('Drop result record without "action": %s', res)
File ~/repos/datalad/datalad/local/addurls.py:1395, in Addurls.__call__(urlfile, urlformat, filenameformat, dataset, input_type, exclude_autometa, meta, key, message, dry_run, fast, ifexists, missing_value, save, version_urls, cfg_proc, jobs, drop_after, on_collision)
1393 else:
1394 displayed_source = "<records>"
-> 1395 records = ensure_list(url_file)
1396 colidx_to_name = {}
1398 rows = None
File ~/repos/datalad/datalad/utils.py:736, in ensure_list(s, copy, iterate)
724 def ensure_list(s, copy=False, iterate=True):
725 """Given not a list, would place it into a list. If None - empty list is returned
726
727 Parameters
(...)
734 iterate over it.
735 """
--> 736 return ensure_iter(s, list, copy=copy, iterate=iterate)
File ~/repos/datalad/datalad/utils.py:717, in ensure_iter(s, cls, copy, iterate)
715 return cls((s,))
716 elif iterate and hasattr(s, '__iter__'):
--> 717 return cls(s)
718 elif s is None:
719 return cls()
File ~/repos/datalad-ebrains/datalad_ebrains/fairgraph_query.py:132, in FairGraphQuery.get_file_records(self, ds, kg_dsver)
127 # get the repos base url by removing the query string
128 # input is like: https://example.com/<basepath>?prefix=MPM-collections/13/
129 # output is: https://example.com/<basepath>
130 # the prefix is part of the file IRIs again
131 dvr_baseurl = urlparse(dvr.iri.value)._replace(query='').geturl()
--> 132 for f in self.iter_files(dvr):
133 f_url = f.iri.value
134 # the IRI is not a valid URL(?!), we must quote the path
135 # to make it such
File ~/repos/datalad-ebrains/datalad_ebrains/fairgraph_query.py:154, in FairGraphQuery.iter_files(self, dvr, chunk_size)
152 cur_index = 0
153 while True:
--> 154 batch = omcore.File.list(
155 self.client,
156 file_repository=dvr,
157 limit=chunk_size,
158 from_index=cur_index)
159 for f in batch:
160 yield f
File ~/env/ebrains/lib/python3.10/site-packages/fairgraph/base_v3.py:520, in KGObject.list(cls, client, size, from_index, api, scope, resolved, space, **filters)
518 normalized_filters = normalize_filter(cls, filters) or None
519 query = cls._get_query_definition(client, normalized_filters, space, resolved)
--> 520 instances = client.query(
521 normalized_filters, query["@id"],
522 space=space,
523 from_index=from_index, size=size,
524 scope=scope
525 ).data
526 for instance in instances:
527 instance["@context"] = cls.context
File ~/env/ebrains/lib/python3.10/site-packages/fairgraph/client_v3.py:150, in KGv3Client.query(self, filter, query_id, space, instance_id, from_index, size, scope, id_key)
148 return response
149 else:
--> 150 return _query(scope, from_index, size)
File ~/env/ebrains/lib/python3.10/site-packages/fairgraph/client_v3.py:127, in KGv3Client.query.<locals>._query(scope, from_index, size)
118 def _query(scope, from_index, size):
119 response = self._kg_client.queries.execute_query_by_id(
120 query_id=self.uuid_from_uri(query_id),
121 additional_request_params=filter or {},
(...)
125 #restrict_to_spaces=[space] if space else None,
126 )
--> 127 return self._check_response(response)
File ~/env/ebrains/lib/python3.10/site-packages/fairgraph/client_v3.py:112, in KGv3Client._check_response(self, response, ignore_not_found, error_context)
110 return response
111 else:
--> 112 raise Exception(f"Error: {response.error} {error_context}")
113 else:
114 return response
Exception: Error: code=500 message='Internal Server Error' uuid=None
OK, this is unexpected, and very valuable information. The error is strange, but it may be related to a particular permission setup that my account has and your's doesn't. I will investigate. Thx!
A second attempt worked. Maybe there should be some form of automatic retry?
OK, thanks for the update. I'll keep this open and looking into some form of mitigation.
fairgraph is a Python API for EBRAIN knowledge graph queries, developed by the EBRAINS community. https://fairgraph.readthedocs.io
This is intended to replace all prior query implementations.
Right now it is only usable in a Python session like so:
where
a8932c7e-063c-4131-ab96-996d843998e9
is the ID of a knowledge graphDataset
orDatasetVersion
in OpenMinds terminology.https://search.kg.ebrains.eu/instances/a8932c7e-063c-4131-ab96-996d843998e9
This will identify the underlying
Dataset
, and subsequently traverse all known versions.It will then generate a DataLad dataset with one commit per DatasetVersion. Given unchanged information on the side of the EBRAINS knowledge graph, this dataset generation is reproducible, meaning: running this code twice will generate the exact same dataset (down to the gitsha values).
Each commit will contain file pointers to the respective EBRAINS file repository, referring to all files in their particular version that are part of a particular
DatasetVersion
.The
version_innovation
is used as the commit message, and theversion_identifier
is assigned as a tag to each release commit.The screenshot shows the resulting dataset visualized with DataLad Gooey