cisagov / cyhy-core

Core code for Cyber Hygiene (CyHy)
Other
8 stars 10 forks source link

More thoroughly prevent non-ASCII from ending up in the MongoDB database #78

Closed mcdonnnj closed 1 year ago

mcdonnnj commented 1 year ago

🗣 Description

This pull requests add checking for non-ASCII characters in the cyhy-simple script and expands existing checking in the cyhy-import script to include reading from standard input. I also bump the versions of the respective scripts to go along with these changes to functionality.

💭 Motivation and context

When non-ASCII characters end up in the MongoDB database it will cause issues when other CyHy components (like cyhy-reports) to fail when interacting with this data. This is a result of how strings are handled in Python 2 and the fact that we do not use Unicode strings (this is explicitly required in Python 2 vs. implicit in Python 3).

🧪 Testing

I confirmed that I was unable to convert an INI formatted file that contained non-ASCII characters with the cyhy-simple script. I then re-confirmed that I could not import a JSON containing non-ASCII when passed as a file to cyhy-import and received the same error when attempting to import the file passing it in through standard input.

✅ Pre-approval checklist