Questions on ossem_converter to convert markdown to yaml

ashwin-patil commented 4 years ago

Hi,

First of all, thank you for taking time to write the python tool ossem_converter.py to convert to and from markdown and yaml. I am currently contributing aws data sources to the project and have created multiple markdowns for the aws data sources at https://github.com/hunters-forge/OSSEM/tree/aws-datadictionary/data_dictionaries/aws. I tried using the tool to convert it to yaml before raising PR but was unsuccessful. It seems the code to convert from markdown to yaml is currently commented out (lines: 554-555,560-561,569-578), i tried uncommenting and use it locally but did not work. Before i investigate it further , i thought i should ask.

syntax used after uncommenting. It does not produce any error but also does not produce output files. : python ossem_converter.py --from-md <aws folder path with markdowns> --to-yml <dest path>

Could you please point me or guide me correct instructions to convert those markdowns in aws folder to yaml with the script if supported ?

Also i have couple of follow-up questions when we do conversions.

Does the script accepts multi line markdowns for description field ?
Some of the fields in aws data sources are of dictionary/dynamic data type which can be seen in sample value. (e.g. UserIdentity, requestParameters, responseElements in CloudTrail). Are these supported in the markdown-to-yaml conversion ?

Thanks.

Cyb3rWard0g commented 4 years ago

Hey @ashwin-patil thank you for all the information. i believe it would be good to loop @hxnoyd in this since he created the script. Hey @hxnoyd , whenever you have a chance would you mind taking a look at @ashwin-patil questions about regarding the ossem_coverter.py script? Thank you man in advance.

hxnoyd commented 4 years ago

Hi @ashwin-patil. ossem_converter.py option to convert from markdown was a temporary feature of the script, that we used when converting the initial OSSEM data sets from markdown to YAML.

The reason why is not working for you is because the markdown parser expects the 'old' OSSEM markdown structure. The new markdown structure is generated out of YAML, thus with a different structure, that ossem_converter.py does not know how to parse. This is the reason why the code was commented ;)

Ideally, everyone authoring new OSSEM data should do it in YAML. The reason for this is because YAML provides a cleaner data structure, due to is simplicity and readability. Another reason why ossem_converter.py conversion from markdown to YAML was commented, was to avoid writing back to YAML, specially from sources prone to structure errors (like markdown), that would pollute the YAML source.

That said, I would suggest that you re-create your AWS data dictionaries in YAML, under the /source folder. After creating the YAML data, you can use ossem_converter.py to generate the markdown. This will ensure that your PR will contain the YAML and the script generated markdown.

Going back to your questions: Q: "Does the script accepts multi line markdowns for description field ?" A: Yes, if you are using YAML.

Q: "Some of the fields in aws data sources are of dictionary/dynamic data type which can be seen in sample value. (e.g. UserIdentity, requestParameters, responseElements in CloudTrail). Are these supported in the markdown-to-yaml conversion ?" A: Can you provide some event data as an example? I need more information to give you an informed answer :)

Thanks!

ashwin-patil commented 4 years ago

Got it. Thanks for the swift and detailed response. This makes sense. I can regenerate my files in YAML with bit of excel and notepad++ as I have structure ready. Regarding my other questions, i can make it work in yaml with either char(|) to handle newline, linebreaks or by quoting the dictionary/dynamic datatypes to escape those. I will go ahead and close the issue.

OTRF / OSSEM

Questions on ossem_converter to convert markdown to yaml #74