Closed ylyangtw closed 1 year ago
(ignore. this get's rid of the entire namespace)
In manage blaze graph, there are delete and create namespace, was setup for integration testing, but think it should work.
If there is an issue with files loading to the graph, Then We should probably look to save the file to s3... then load to the graph.
We could better test the loading issues. aka, how can we test this, and catch this, and where should it be done (dagster/ec_utils)
Then maybe the saved becomes an actual asset, at some point
summarize
would fail if the source exists in the summary namespace. For testing, I clear the summary namespace first using this sqarql:DELETE ?s ?p ?o WHERE { ?s ?p ?o .FILTER regex(str(?s), "iris") .}
This issue as how to push a file in s3 to a graph.
If there is an issue with files loading to the graph, Then We should probably look to save the file to s3... then load to the graph.
We could better test the loading issues. aka, how can we test this, and catch this, and where should it be done (dagster/ec_utils)
Then maybe the saved becomes an actual asset, at some point
summarize
would fail if the source exists in the summary namespace. For testing, I clear the summary namespace first using this sqarql:DELETE ?s ?p ?o WHERE { ?s ?p ?o .FILTER regex(str(?s), "iris") .}
Agree!
if loading to graph failed, it will upload the result to s3.
pull and check this. I reworked the code
Thanks @valentinedwv!
tasks
under deployment also works on my local.File "/usr/src/app/project/eco/ops/implnet_ops_iris.py", line 676, in iris_missingreport_graph
returned_value = missingReport(source_url, bucket, source_name, s3Minio, graphendpoint, milled=milled, summon=summon)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ec/reporting/report.py", line 99, in missingReport
graph_urns = ec.graph.sparql_query.queryWithSparql("repo_select_graphs", graphendpoint, {"repo": repo})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ec/graph/sparql_query.py", line 39, in queryWithSparql
q_df = sparqldataframe.query(endpoint, thsGraphQuery)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sparqldataframe/__init__.py", line 24, in query
raise Exception('Invalid query')
Do you maybe know what this can be?
Might need to update a dependency to: https://github.com/earthcube/scheduler/blob/51652704c1a7022c1b61c6a09fe8240c15380b65/dagster/implnets/requirements_code.txt#L18
earthcube-utilities @ git+https://github.com/earthcube/earthcube_utilities@b671efb#subdirectory=earthcube_utilities
Should have changed it in the standard requirements.txt, too.
Try again. Got the triplestore swapped. fixed an upload issue.
Nice! All work
summarize
would fail if the source exists in the summary namespace. For testing, I clear the summary namespace first using this sqarql: