DEKRA-Cybersecurity / MAS-Preloaded-Apps-Scripts

MIT License
5 stars 1 forks source link

Proposal for changes in the project architecture #2

Open ernstleierzopf opened 2 months ago

ernstleierzopf commented 2 months ago

I will use this thread to propose changes to the project. These are (or at least it should be after we agree upon it) ordered in the chronological order in which these changes should be implemented. Please comment in this issue if you agree/disagree. I suggest implementing these changes in smaller packages for easier review.

This issue should be adapted after steps of it were implemented to include more detail and answer occurring questions.

Checkboxes:

11 Very important would be to exactly define which tests were implemented (including the ID of the MASTG test. This should be seen in the file name. The file should include a comprehensive description of what part of the document was implemented and what the goal of the test is.

frandelpinodekra commented 2 months ago

Hi Ernst,

Thank you for your comments, we will add improvements as we go.

Regarding question 11, for an APK, if the NETWORK test cases appear as NA, it is because the application does not have any of the INTERNET, ACCESS_NETWORK_STATE or ACCESS_WIFI_STATE permissions.

We also wanted to ask you a question, in point 8 what exactly do you mean by dependencies?

ernstleierzopf commented 2 months ago

Hi Francisco,

thanks for answering my question. I would argue that the application not having the mentioned permissions should lead to a positive test result, because the vulnerability can not exist in it. What do you think?

With dependencies I mean the and tags within the Manifest files. These show which libraries are used by the APK.

We would also like to help with the implementation, because it is a lot of work.

lambdapioneer commented 2 months ago

Hey, my quick comments on the issues.

High-level comment: I would suggest we keep the DekraScript oblivious of how we structure the rest of the project (i.e. it does not need to know what vendor or firmware an APK is from). This accomplishes two important things: (i) the DekraScript stays flexible for other uses (UNIX tradition) and (ii) we don't need to synchronize project structure across to repositories.

Since we agree on most issues in general, do we want to transform this issue into a task list and discuss details in the individual issues?

  1. Obviously sounds good :)
  2. What timeout should aim for? I feel choosing something like 2xp95 would be a good start, but requires some quick measurements. Alternatively, we timeout on the whole process on a higher-level (i.e. our scripts).
  3. Maybe we can specify a sample CLI call and expected output structure? That way there is some sort of specification to work against. Personally, I am a big fan of all things examples and sample code. I added a suggestion below.
  4. Not sure if "hash" is the right word here. Maybe uniquely name since you seem to use an UUID in your example? Also, this would require a bit more details on how the input structure looks like... maybe this is better left for the higher-level logic that calls the DekraScripts? Then we leave the tool here flexible.
  5. See point 3 re/ sample input/output.
  6. I probably would leave the formula calculation entirely to the evaluation phase. That separates concerns and allows for easier experimentation with the parameters.
  7. That seems then to bake a lot of assumptions into the tools of this repository. Maybe that logic is better kept in the higher-level project?
  8. Generally agree. I think this can all be captured in an input/output as an easy specification.
  9. (missing)
  10. Strongly agree :) I think the best would be to simply wrap the current calls in a simple data layer and then exchange the implementation underneath.
  11. (no comment)

Regarding the last post: I am not sure what you mean with "tags within the Manifest files". Figuring that out for APK files is a quite hard problem indeed (see e.g. LibID: https://dl.acm.org/doi/pdf/10.1145/3293882.3330563)

Sample (just as a starting point, please copy-paste and share your variant)

Input file structure (keep in mind that the caller can name the sample.apk as it likes, the script does not need to know):

DekraScripts/
sample.apk

Call the single-APK script (keep in mind that the caller can name the output-folder as it likes, the script does not need to know):

$ python3 analyze_apk.py ../sample.apk -o ../output-folder

Output file structure:

DekraScripts/
sample.apk
output-folder/permissions.csv
output-folder/test-results.csv
output-folder/logging.txt

Output in output-folder/permission.csv:

permission;comment;
INTERNET;;
READ_STORAGE;;

Output in output-folder/test-results.csv (note that I mention both the MASTG test ID and the check name, this is because we might do multiple independent checks against the same test ID; the evaluation can make sense of it later):

test;check;result;comment;
MASTG-TEST-0013;check-for-hard-coded-keys;PASS;checked 1000 files
MASTG-TEST-0013;check-for-old-ciphers;PASS;checked 1337 files
MASTG-TEST-0016;check-for-non-secure-random;FAIL;found 2 failures;
MASTG-TEST-0015;check-for-key-resuse;NA;ignore because of test error

Output in output-folder/logging.txt:

...
20230101T1100 [Test0016.py] INFO Found non secure random in file with SHA call: CustomCrypto.java:111
20230101T1100 [Test0016.py] INFO Found non secure random in file with AES call: MyEncryption.java:42
20230101T1100 [Test0015.py] WARN The data flow analysis threw an error: "cyclic dependency detected, bail"
...

As said, this is just an example. And please copy-paste the next iteration below :)

ernstleierzopf commented 2 months ago

Hey, my quick comments on the issues.

High-level comment: I would suggest we keep the DekraScript oblivious of how we structure the rest of the project (i.e. it does not need to know what vendor or firmware an APK is from). This accomplishes two important things: (i) the DekraScript stays flexible for other uses (UNIX tradition) and (ii) we don't need to synchronize project structure across to repositories.

sounds good.

Since we agree on most issues in general, do we want to transform this issue into a task list and discuss details in the individual issues?

yes I will create atomar tasksr/issues which can be implemented on its own. However, I would leave the numbering as it is easier to mention in comments - added a checkbox above which will be extended with the issues.

  1. Obviously sounds good :)
  2. What timeout should aim for? I feel choosing something like 2xp95 would be a good start, but requires some quick measurements. Alternatively, we timeout on the whole process on a higher-level (i.e. our scripts).

by test 300 seconds are a good value.

  1. Maybe we can specify a sample CLI call and expected output structure? That way there is some sort of specification to work against. Personally, I am a big fan of all things examples and sample code. I added a suggestion below.

Good idea, but I do not have the answers on what the "best" structure is - this needs to be discussed when we are at that point.

  1. Not sure if "hash" is the right word here. Maybe uniquely name since you seem to use an UUID in your example? Also, this would require a bit more details on how the input structure looks like... maybe this is better left for the higher-level logic that calls the DekraScripts? Then we leave the tool here flexible.

Here I meant that the content of that file should be the hashes of each APK of the tested firmware image.

  1. See point 3 re/ sample input/output.

agree.

  1. I probably would leave the formula calculation entirely to the evaluation phase. That separates concerns and allows for easier experimentation with the parameters.

Yes, that is how I meant it to be. It should be completely separate and does not need to be called within the main processing task. I would leave this functionality in the Dekra project (but as a separate script)

  1. That seems then to bake a lot of assumptions into the tools of this repository. Maybe that logic is better kept in the higher-level project?

good point. But our higher-level project is still not open-source, so we could also keep it here (at least for now). It does not hurt the flexibility of the tool as you mentioned in your comment above.

  1. Generally agree. I think this can all be captured in an input/output as an easy specification.
  2. (missing)
  3. Strongly agree :) I think the best would be to simply wrap the current calls in a simple data layer and then exchange the implementation underneath.
  4. (no comment)

Regarding the last post: I am not sure what you mean with "tags within the Manifest files". Figuring that out for APK files is a quite hard problem indeed (see e.g. LibID: https://dl.acm.org/doi/pdf/10.1145/3293882.3330563)

Oh, I did not know that. Thought these libraries are simply stated in the Manifest file. Skip this step, if not easily possible.

Sample (just as a starting point, please copy-paste and share your variant)

Input file structure (keep in mind that the caller can name the sample.apk as it likes, the script does not need to know):

DekraScripts/
sample.apk

Call the single-APK script (keep in mind that the caller can name the output-folder as it likes, the script does not need to know):

$ python3 analyze_apk.py ../sample.apk -o ../output-folder

Output file structure:

DekraScripts/
sample.apk
output-folder/permissions.csv
output-folder/test-results.csv
output-folder/logging.txt

Output in output-folder/permission.csv:

permission;comment;
INTERNET;;
READ_STORAGE;;

Output in output-folder/test-results.csv (note that I mention both the MASTG test ID and the check name, this is because we might do multiple independent checks against the same test ID; the evaluation can make sense of it later):

test;check;result;comment;
MASTG-TEST-0013;check-for-hard-coded-keys;PASS;checked 1000 files
MASTG-TEST-0013;check-for-old-ciphers;PASS;checked 1337 files
MASTG-TEST-0016;check-for-non-secure-random;FAIL;found 2 failures;
MASTG-TEST-0015;check-for-key-resuse;NA;ignore because of test error

Output in output-folder/logging.txt:

...
20230101T1100 [Test0016.py] INFO Found non secure random in file with SHA call: CustomCrypto.java:111
20230101T1100 [Test0016.py] INFO Found non secure random in file with AES call: MyEncryption.java:42
20230101T1100 [Test0015.py] WARN The data flow analysis threw an error: "cyclic dependency detected, bail"
...

As said, this is just an example. And please copy-paste the next iteration below :)

luisadnsaavedra commented 2 months ago

Regarding the "tags" from the ManifestFile, I think you are both right, there are things we can obtain as we did in the ModZoo project directly from the ManifestFile. But in order to know specific versions of libraries and get a full list of libraries we would need a tool like LibID which has exponential costs and is not scalable beyond a few libraries.

Again, this is also something that could be kept separate from the Dekra tools if it simply outputs the Manifest file, or could be incorporated.

ernstleierzopf commented 2 months ago

Regarding the "tags" from the ManifestFile, I think you are both right, there are things we can obtain as we did in the ModZoo project directly from the ManifestFile. But in order to know specific versions of libraries and get a full list of libraries we would need a tool like LibID which has exponential costs and is not scalable beyond a few libraries.

Again, this is also something that could be kept separate from the Dekra tools if it simply outputs the Manifest file, or could be incorporated.

Wow this is exactly what I was searching for all the time! We definitely need to include this in our pipeline. I am not sure, but we could let it run per app/library and compare new files against that profile - obviously we can skip files with the same hash. Needs some work to be done in advance, but some of it we also need to do for the Dekra scripts (like storing the hash of the APK).

Did you test this tool in your previous work?