ge-high-assurance / RACK

DARPA's Automated Rapid Certification of Software (ARCOS) project called Rapid Assurance Curation Kit (RACK)
BSD 3-Clause "New" or "Revised" License
20 stars 6 forks source link

Revisit instructions on creating data for ingestion -- Back to basics #635

Closed kityansiu closed 2 years ago

kityansiu commented 2 years ago

Some existing wiki pages that need to be revisited:

baoluomeng commented 2 years ago

I was using RACK cli tool for this activity.

Sequence of Steps to Create and Ingest Overlay and Instance Data Using RACK cli Tool

  1. Create a SADL overlay project. The SADL file will automatically generate OWL files for step 2. The recommended layout of the overlay project can be found: https://github.com/ge-high-assurance/RACK/wiki/Ingestion-packages#introduction

  2. Create a yaml file with the name "import.yaml" in the /OwlModels folder of the overlay project and add the the owl files names in the "import.yaml" as follows: - files: - owlfilename1.owl - owlfilename2.owl

  3. Create CDRs and ingestion nodegroups using the RACK/nodegroups/generate-cdrs.sh. (Dan Prince can skip this step) For this, we need the Java JAR file to create CDRs and ingestion nodegroups. But I cannot find the jar file anywhere in the repo. Then, I reached out Paul C. and he sent me the JAR file. The shell script was pre-configured to generate CDRs and nodegroups for a list of overlay projects. And the generated CDRs and nodegroups files will be in the folder /nodegroups/CDR/ and /nodegroups/ingestion/arcos.YourProjectName respectively in parallel to the script. It also ingest the RACK core ontology and the project's overlays in the RACK repo before generating the CDRs and nodegroups. I think we need to revise the script to be more generic. Maybe this is the functionality of the scraping Tool.

    $ ./generate-cdrs.sh standaloneExecutables-jar-with-dependencies.jar

  4. Import the ingestion nodegroups $rack nodegroups import RACK/nodegroups/ingestion/arcos.turnstile

  5. Create instance data for my overlay. I only tried with csv files. So the first thing I want to know is how my csv headers look like. Then I found https://github.com/ge-high-assurance/RACK/wiki/Ingestion-packages#instance-data. But it still didn't resolve my questions. Then I asked Kit, and Kit told me that the headers should be in the CDRs. Then I went there and realized that the CDR has all the headers. Another question I had was which headers are required and which headers are optional. Then, I started to try different things. So first, I just put whatever data I need in the csv files. Then by following the tutorial https://github.com/ge-high-assurance/RACK/wiki/Ingestion-packages#instance-data, which i think it is a bit cumbersome, but I was able to create a set of my own instance data.

    Still, I have questions unanswered. Which headers are optional and which headers are required? How do I know this? What are the options to create instance data?

  6. Create and execute the ingestion script. The script template can be found at https://github.com/ge-high-assurance/RACK/wiki/Ingestion-packages#ingestion-script $./Load-InstanceData.sh

cuddihyge commented 2 years ago

be sure to check out issue #671 I think that section of docs will be very helpful