Revisit instructions on creating data for ingestion -- Back to basics

kityansiu commented 2 years ago

Basic sequence of steps on how to create a nodegroup for the RACK-Ontology and how to create a csv data file for it.
Define basic terminologies: what is a nodegroup, what is a model/data in SemTK. Sometimes some of the terms used in different context mean different things. (i.e., model in SemTK vs. model in AADL)

Some existing wiki pages that need to be revisited:

TA1-SPARQLgraph
TA3-SPARQLgraph
SPARQLgraph startup -- this should cover the SPARQLgraph info currently in the TA1 and TA3 pages

baoluomeng commented 2 years ago

I was using RACK cli tool for this activity.

Sequence of Steps to Create and Ingest Overlay and Instance Data Using RACK cli Tool

Create a SADL overlay project. The SADL file will automatically generate OWL files for step 2. The recommended layout of the overlay project can be found: https://github.com/ge-high-assurance/RACK/wiki/Ingestion-packages#introduction
Create a yaml file with the name "import.yaml" in the /OwlModels folder of the overlay project and add the the owl files names in the "import.yaml" as follows: - files: - owlfilename1.owl - owlfilename2.owl
Create CDRs and ingestion nodegroups using the RACK/nodegroups/generate-cdrs.sh. (Dan Prince can skip this step) For this, we need the Java JAR file to create CDRs and ingestion nodegroups. But I cannot find the jar file anywhere in the repo. Then, I reached out Paul C. and he sent me the JAR file. The shell script was pre-configured to generate CDRs and nodegroups for a list of overlay projects. And the generated CDRs and nodegroups files will be in the folder /nodegroups/CDR/ and /nodegroups/ingestion/arcos.YourProjectName respectively in parallel to the script. It also ingest the RACK core ontology and the project's overlays in the RACK repo before generating the CDRs and nodegroups. I think we need to revise the script to be more generic. Maybe this is the functionality of the scraping Tool.

$ ./generate-cdrs.sh standaloneExecutables-jar-with-dependencies.jar
Import the ingestion nodegroups $rack nodegroups import RACK/nodegroups/ingestion/arcos.turnstile
Create instance data for my overlay. I only tried with csv files. So the first thing I want to know is how my csv headers look like. Then I found https://github.com/ge-high-assurance/RACK/wiki/Ingestion-packages#instance-data. But it still didn't resolve my questions. Then I asked Kit, and Kit told me that the headers should be in the CDRs. Then I went there and realized that the CDR has all the headers. Another question I had was which headers are required and which headers are optional. Then, I started to try different things. So first, I just put whatever data I need in the csv files. Then by following the tutorial https://github.com/ge-high-assurance/RACK/wiki/Ingestion-packages#instance-data, which i think it is a bit cumbersome, but I was able to create a set of my own instance data.

Still, I have questions unanswered. Which headers are optional and which headers are required? How do I know this? What are the options to create instance data?
Create and execute the ingestion script. The script template can be found at https://github.com/ge-high-assurance/RACK/wiki/Ingestion-packages#ingestion-script $./Load-InstanceData.sh

cuddihyge commented 2 years ago

be sure to check out issue #671 I think that section of docs will be very helpful

ge-high-assurance / RACK

Revisit instructions on creating data for ingestion -- Back to basics #635