Closed joewheaton closed 3 years ago
@philipbaileynar and I discussed this at length. @philipbaileynar is worried this is a much bigger request with fragility and tech debt. I think it is much simpler. The truth is somewhere in between. @philipbaileynar suggested I show what I mean by the XML project file.
I opened up a BRAT project (101900002) and focused on this concept for the VBET:
Scroll down to realizations
<Realizations>
<BRAT id="BRAT1" dateCreated="2021-01-31T19:13:03.747399" guid="5796d944-63f8-11eb-bc14-0a58a9feac02" productVersion="4.2.1">
<Name>BRAT for HUC 10190002</Name>
<Inputs>
....
<Geopackage guid="503fe52e-a7f7-41a0-9114-15e6fba096cf" id="INPUTS">
<Name>Confinement</Name>
<Path>inputs/inputs.gpkg</Path>
<Layers>
Then down to the VBET layer in that inptus.gpkg
. This is what it looks like now a. A vector with two tags (<Name>
and <Path>
) and two attributes (a guid
and an id
).
<Vector guid="50495cab-a127-401a-9161-2d34b331741d" id="VALLEY_BOTTOM">
<Name>Valley Bottom</Name>
<Path>valley_bottom</Path>
</Vector>
Now I am always a little shaky on whether to do things as new tags or attributes. What I want to know is:
But what if we made a project references something like the following:
I just want to know what things came from a riverscapes project we also have in the warehouse. I want to be able to chase the bread crumbs back. There are three potential ways I imagine the user to do this (in order I care about):
You could make it prettier or fancier. All I want is ability to click on URL and go to warehouse. Here's a video
Ok, here's an idea we can build on. Even as I type this I'm not sure how I feel about it but it's a start.
Let's take a real example: a typical VBET project with a slope raster that needs referencing
<VBET id="VBET" dateCreated="2021-01-30T01:18:15.127300" guid="06f696ba-6299-11eb-9b59-0a58a9feac02" productVersion="0.3.2">
<Name>VBET for HUC 16050302</Name>
<Inputs>
<Raster guid="4c94ec16-f607-4de3-b793-26dc4ccc9fd9" id="SLOPE_RASTER">
<Name>Slope Raster</Name>
<Path>inputs/slope.tif</Path>
<MetaData>
<Meta name="srcType">RSContext</Meta>
<Meta name="srcGUID">fea64c39-f172-42fa-8dc6-12e22409f473</Meta>
<Meta name="srcWarehouseGUID">953c4550-585b-493c-ae5b-5108f263ed67</Meta>
</MetaData>
</Raster>
You can see we're introducing some <Meta>
keys like srcType
that have specific meanings. If Rave finds these it will know what to do with them. If they aren't there then nothing happens.
Let's look at them in detail:
srcType
the machine name of the project type this was derived from: RSContext
, VBET
, BRAT
etc.srcWarehouseGUID
This is analogous to the URL you were asking for @joe. It's the guid at the end of the URL on the data warehouse website.srcGUID
The guid from the top of the actual RSContext project.rs.xml
file. We include this mainly for debugging purpose. The warehouseGuid does not uniquely identify a project in time.We can add other tags too like the version of RSContext that was used, the date it was run etc.
My preference would be to avoid storing the whole URL in the project file and instead we build it when we load the project in rave by concatenating https://data.riverscapes.xyz/#/Anabranch/
+ <srcWarehouseGUID>
riverscapes-tools
This isn't too bad and it relates to a ticket I had started . We don't want any of this meta lookup stuff in VBET. We want a meta tool for meta data. Something that can understand the progression of these tools and fill in metadata intelligently when cybercastor runs.
Nothing to do here. Cybercastor is stupid and it will remain stupid, running whatever we tell it too and not caring a bit.
All rave needs is an ability to recognize specific meta keys and render things like urls intelligently.
type
attribute onto the <Meta>
xml tag to help Rave figure out what to do with it.I really like this @MattReimer. The example makes enough sense that even I could do it. All the implementation stuff makes sense to me too.
Re the caveats. In the short and long term I don't care if we reference some broken links here and there. Especially if they are from user contributed projects. In long term our cyber castor derivatives we sell should reference stable projects. In long-term, once the suite of projects built from production grade tools settles into what we could consider data version >=1.x instead of 0.? Beta, we should figure out a way to not replace.
Thanks for great suggestions. @philipbaileynar what worry you?
I really like @MattReimer metadata tags, their names and how they are used. The challenge will be how on earth we get hold of the GUIDs in question. The VBET tool knows nothing about riverscapes context. It doesn't know the DEM it was passed as an argument is part of a riverscapes project. It doesn't know the XML node or GUID or anything. We have decoupled all our tools from upstream project XML.
@MattReimer you said:
We want a meta tool for meta data.
Are you suggesting a completely separate piece of code that runs after a riverscapes tool has finished? Its sole purpose would be to enrich the output project XML with the metadata tags you propose, taking the information from the input XML(s). My term for this is "project enricher". Here's the command line for a "BRAT project enricher".
python enricher <rs_context_project_xml_path> <vbet_project_xml_path> <brat_project_xml_path>
The enricher would know which datasets are used from the first two projects and take their layer metadata and inject into the latter project xml.
We restructure all riverscapes tools to take upstream project XML paths as their arguments instead of the individual dataset paths. i.e BRAT would take a riverscapes context and VBET project path instead of a DEM raster, flow line network and valley bottom etc.
This is weeks and weeks of unfunded work that will destabilize every tool. It also means that none of us can run the tools without first creating whatever upstream projects are needed as inputs. By contrast, today you can run VBET with any DEM, you don't need an entire riverscapes context project that contains a DEM.
Blending option 1 and 2... we could extend the command line arguments for all tools with optional arguments for each upstream project XML file. If the tool is supplied with these paths then it knows how to take each command line dataset argument and find its node in the upstream project XML (reverse path lookup), take the metadata for this node and inject it into the output project XML file.
What if there's some clever daemon running in the warehouse that is constantly checking MD5 hashes that uniquely identify each dataset. With some fancy logic it could relate two projects sharing a dataset with the same MD5 hash and relate the two together. This won't work of course now that we use GeoPackages and burry multiple datasets inside a single file.
srcWarehouseGUID
will still be relevant, but essentially 100% of the srcGUID
s will be invalid and point to nothing.I think the Cybercastor Runner Script (in riverscapes-tools) ticket actually handles all of this well.
The goal of that tool is to handle all the external context and we leave the tools alone to just do what they do.
The cybercastor runner can inject sub tags inside individual layers within a project?
I get that it can take some command line meta and stuff it into the project node of the output project file.
I just need to be educated how this runner script is going to get the guids of the vbet50 layer from a vbet project and inject them into an output brat project.
@joewheaton has approved this work for the Feb release.
Here are the RAVE features that are funded:
Note this is related to #27
Version 2.1.0 now allows:
The Problem
There are many examples of projects that use inputs that derive from other outputs or intermediates of other projects. For example, a
GUT
project andGCD
project both rely on DEMs as their inputs, but many times those DEMs come from aChamp Topo Survey
project. Or, we driveConfinement
,River Styles
,RCAT
, andBRAT
projects all off a valley bottom polygon that can (does not have to) come from aVBET
output. I don't want to duplicate the entire project. I just want the layer(s) that is necessary for the project of interest (i.e. one making reference) to have the file it needs to operate, but have the added context of what other Riverscapes Project it came from.The Feature Request
Change