Refactoring (step up module architecture, establish central location for common functionality, cleanup linux analyzer)

luis261 commented 4 months ago

Module excecution improvements (for now only applied to the linux analyzer, other modules will be modified in dedicated PRs):

the simplest modules, which don't need any upfront input can simply be run via import
more involved modules shoud provide the following definitions:
- run: a function that takes in all required arguments as parameters, which would otherwise be provided via sys.argv if running the module directly in a shell (instead of inside Python via import)
- main: a function which wraps around run, executing it with sys.argv input
- an if statement that runs the main function if __name__ == "__main__"
in qu1cksc0pe.py, we can then import modules as required without the need for subshells/os commands (if they need input, we pass it via <modulename>.run)
this architecture let's us avoid the need for workarounds to pass inputs to analyzers, e.g. via the .path_handler tempfile (instead, we just pass any such needed information to a given module via a parameter of its local run function)

[!NOTE] changes below were already communicated/agreed upon previously:

Conversion of Modules directory into a Python package

this is worth it I think alone for sharing functionality (see below), but there is more to it than just that:

Having had enough experience myself with Python's quirky import system, I think I have some understanding for why it might be preferable to avoid dealing with it in some circumstances. Can you tell me the specific reasons though why you execute modules via os.system instead of staying inside a single Python interpreter instance if there are any?

If you're ok with it, I'll slowly move us towards staying inside a single Python instance by leveraging import statements instead of invoking modules directly via the operating system, since I think the advantages outweigh/justify using the established import system:

performance (launching a subshell via os.system every time we execute a module is more expensive in terms of overhead instead of simply importing into the current Python thread)
security (less risk of accidentally exposing/running different files on the system that aren't part of the program)

ideally we would rename the directory to be lowercase as part of this change to conform to Python module naming conventions .. I held off on it for now, since it's not strictly necessary and I'm unsure of the implications in terms of the other locations we'd have to adjust

Extraction of utility function previously strictly local to main module into separate module

might not seem sensible for now given that it only houses a single, tiny function
I'm sure it'll pay off in the long haul/soon though
- can continue extracting duplicate functionality there, to be reused across different modules

luis261 commented 4 months ago

Hello @CYB3RMX, I just need a yes/no from you for now on whether you're ok with my current trajectory so I can keep the changes coming. If not, let me know and I'll try to adjust.

CYB3RMX commented 4 months ago

Hello @luis261 !

Thank you for your work and support. I think this architecture will make the project's codebase more understandable and maintainable. I am also considering adjusting the codebase to make packaging this project easier for future processes.

luis261 commented 4 months ago

Hi @CYB3RMX (:

I'm excited to hear that! :rocket:

I'll continue my work tomorrow and probably ping you in a few days when I've built up some more substantial changes (having the ability to import shared functionality definitely enables me to do that)

CYB3RMX commented 4 months ago

Hello again @luis261 !

When I was checking your changes, the code threw these errors.

Could you please also try to run program after your changes?

luis261 commented 4 months ago

Hi @CYB3RMX,

thank your for making the effort and trying to test the changes already! Yes, you make a good point, there are still importing issues, as you discovered, this isn't ready to be merged yet.

Let me work on some other changes first, I already know the cause of the issue you're talking about. That's what I meant with Python's quirky import system, it can be quite annoying to get it setup right. (but once you do, it's worth it)

This issue is related to us running the modules via os.system at the moment instead of actual import's. I'll get back to you once I've worked on some other, related changes in a few days. I estimate to get this in a state ready to be merged towards end of this week. I'll convert the PR from draft mode once it's ready (:

luis261 commented 4 months ago

I'm currently working on refactoring the linux analyzer as part of this overall change.

Just a small question only indirectly related to the refactoring: why do you consider it "bad" to have an NX bit? I'm assuming you do because it is printed as red if present. Is that intentional? https://github.com/CYB3RMX/Qu1cksc0pe/blob/7319b9f2552c9b98c6c9fc241feb8ee5dd1726b9/Modules/linAnalyzer.py#L167

CYB3RMX commented 4 months ago

I added this feature to check if the target sample has binary protections such as NX or PIE.

luis261 commented 4 months ago

Yes, I understand the point of the feature, I just don't get why we currently print "True" in bold red if NX is present, since it's a positive feature which increases security? Shouldn't it be green? I guess it's not that important, I'll address that point in a separate PR after the refactoring

CYB3RMX commented 4 months ago

I thought that if the malware sample has some binary protections, it wouldn't be good for us. By the way, did you run the program after your changes?

luis261 commented 4 months ago

The way I understood, having an NX bit is probably good overall and shouldn't affect the binary instrumentation process? https://ctf101.org/binary-exploitation/no-execute/

But I'm not sure, you know more about analysis than me and I guess it makes sense for it to remain as is.

Regarding the state of the program: yes, I did run it, although I don't have a great testing setup for Linux right now. I'm still working on refactoring the linux analyzer overall, to show you an example of what a refactored module could look like. I think I'll get that done today/tomorrow. After that, I'll fix the overall import situation (a simple, albeit annoying fix, because I'll need to modify the line in every module again)

CYB3RMX commented 4 months ago

The error is similar as the previous one. But dont worry we can fix this anyway.

luis261 commented 4 months ago

Yes, I know. I appreciate you trying to run it already, but currently there is no sense in trying that though. This branch will still be unstable until at least tomorrow. I will ping you though once it's ready ok?

CYB3RMX commented 4 months ago

Okay no problem. Thank you :)

luis261 commented 4 months ago

You're welcome, I like working on this project (:

Thank you for your patience and ongoing support!

luis261 commented 4 months ago

Just pushed a few more commits. Still not ready.

Making some progress though, the linux analyzer specific part of the refactoring is almost done now.

luis261 commented 4 months ago

In case you're reading this soon: please continue to ignore my changes for now. But it'll be ready soon, I'll probably keep working and move this out of the draft state before sunrise (we're in a similar timezone). Until then, I still have to:

finish cleaning up/fixing post-refactoring for the linux analyzer
take care of the import issue
explain considerations regarding the import setup
create a small summary/writeup for the refactoring so it's easier for you to review

luis261 commented 4 months ago

Alright, I took care of the import issues. If you're interested in a demonstration of the clusterfuck that is the Python import system btw, here it is: https://github.com/luis261/py-imports-demo

(don't feel obligated to read through it and the linked resources, it's not a very pleasant subject ^-^)

Imports will be converted one at a time as we move modules from being executed in a subshell to being imported in Python itself (e.g. .utils instead of utils) in future PRs.

If we decide to strictly continue support of standalone execution of analyzer modules, we will end up needing duplicated imports, surrounded by try exept statements. Therefore, I advise against that. (That decision however does not have to be taken right now, rather in further refinements down the line)

I added such a statement for the linux analyzer, as seen here, in this special case it is somewhat justified IMO, since the module is not only called from the main file (now via import) but also from the email analyzer via os.system. In other cases I'd like to avoid such a setup however.

I don't think much is gained/lost for endusers in any case, as you'd usually expect them to call analyzers via the main module anyway.

luis261 commented 4 months ago

I'll try to compile a summary for you to possibly make it easier to review (I recommend the "Split" PR view over the "Unified" option for looking through changes btw):

Highlights:

managed to shave 100 lines off the linux analysis module (that's 20 percent!)
- all this without losing functionality of course
through aggressively refactoring out code only indirectly related to analysis itself (table display code especially), I managed to compress the golang analysis method in particular: it's now about 70% smaller!
- e.g. these 8 lines are now just a single function call, distributed over 2 lines
- the table convenience functions I introduced in the process will hopefully find further use in future refactorings, being generally applicable for any other modules which output tables
- in terms of linecount they have essentially "payed off" already
no more reliance on the pathhandling tempfile (for the linux analyser only, will propagate that to other modules over time)
- usually the path should be passed by the main module as a parameter to the run function
- if the module is called directly though, it is automatically derived via __file__
significantly reduced use of module level global variables

Further points:

ended up separating out a few bits of repeated/unrelated code into new methods
I generally leveraged the dynamic nature of Python datastructures to reduce verbosity wherever possible
- e.g. list comprehensions/generators/dict comprehensions
added whitespace and documentation via docstrings in certain places

All that being said, the linux analyzer is obviously still not perfect and I have more ideas in store. I will introduce these in separate PRs however, as not to blow this up even further.

luis261 commented 4 months ago

I think the changes I'm hoping to introduce should now be stable and ready for review (:

luis261 commented 4 months ago

@CYB3RMX do you think you could perform a test run for me?

one typical use of the linux analyzer, as well as maybe one indirect call via the email analyzer would be perfect!

Assuming the code doesn't break I hope you'll accept my changes, obviously I'm open for discussion though. I think especially the new utility functions, as well as general patterns I used could serve as a foundation for further refactoring.

I promise though that I will submit smaller, more atomic PRs going forwards. That way, it'll be easier to review with a more manageable diff.

luis261 commented 4 months ago

oh and one last thing: I left some TODO comment markers, they're meant for me to take care off soon in separate refactorings. I hope they don't bother you, you can just ignore them (:

CYB3RMX commented 3 months ago

Hello @luis261

Thank you for your work and progress. Have you tried running the Linux Analyzer against any sample after your changes?

luis261 commented 3 months ago

Hi @CYB3RMX, thank you for reaching out!

yes, I did try but ultimately couldn't run it. Believe it or not: I don't have a suitable Linux machine available at the moment. My Windows machine is missing a strings installation.

                   <------------------------------------------>
                   <  This tool is very dangerous. Be careful >
           __      <   while using it!!                       >
         _|  |_    <------------------------------------------>
       _|      |_   /
      |  _    _  | /
      | |_|  |_| |
   _  |  _    _  |  _
  |_|_|_| |__| |_|_|_|
    |_|_        _|_|   <- Mr. Virus
      |_|      |_|

strings command not found. You need to install it.

But I did run it through a Python analyzer, as well as loading the module itself (albeit without executing it). I also executed the entrypoint for the whole program (without actually running an analyzer). That all went fine now, so at least there aren't any obvious syntax errors etc anymore. I will also proofread the whole module again tonight.

This temporary situation will improve once I get a proper setup or introduce automated tests. For now though, could you please run it again for me against ls/some other Linux binary? If you could also run it indirectly (attachment) via the email analyzer, that would be perfect:

https://github.com/luis261/Qu1cksc0pe/blob/585c51fba318f70270dd416f89ac6b9edebb4039/Modules/email_analyzer.py#L97

If you absolutely want me to do it myself, I do understand, but unfortunately that would mean it would at the very least have to wait until the weekend (if not longer), as I'm too busy with work right now to completely setup a Linux system just for that purpose.

luis261 commented 3 months ago

Hey @CYB3RMX,

do I still have your support for these changes?

I want to continue contributing to this project. The most direct way of doing that right now involves you performing another simple run of the analyzer against ls so we can get this merged and I can move on to further changes (: Could you do me that favor? I really put in effort here and would love to see this code get merged.

If you are unhappy with the code, please let me know and we can discuss it.

I’m sure I will end up setting up a proper Linux machine including all required installs at some point down the line, since I’m planning on continuing to contribute long term (if you’ll allow that). In the meantime, when selecting modules for further changes I’ll select analyzers which have “lighter” dependencies, so I can run them standalone with my current setup. (For example, I can execute these modules just fine with my current setup: apkSecCheck.py/hashScanner.py/powershell_analyzer.py)

If you’ve just been too busy to respond/review the changes/run the code and need more time, just let me know, I totally understand. I would just appreciate some clarity on where you and I stand on this matter.

Hope to hear from you soon. Thanks, Luis

CYB3RMX commented 3 months ago

Hey @luis261

I've been very busy this week, so I'm sorry for not replying. I need to review and check if everything works properly. After that, I'll inform you.

luis261 commented 3 months ago

Hey @CYB3RMX!

Thank you so much for your swift reply! I appreciate the clear communications, don't stress yourself, it's not very urgent (:

CYB3RMX commented 3 months ago

Hello @luis261

When I check your changes against the samples, the 'LinuxAnalyzer' module throws this error:

Could you please re-check your changes?

luis261 commented 3 months ago

@CYB3RMX

sorry about that, it should work now

CYB3RMX commented 3 months ago

Hello @luis261 !

I checked your changes and tried to perform a scan against a simple Linux sample. Everything worked properly in this case. After that, I installed Qu1cksc0pe on my system using the --install argument and tried to analyze the target sample again. After this, the program threw this error:

I think we should also do something for the utility/helper modules. We can import them from the sc0pe_base folder, I guess.

luis261 commented 3 months ago

Hello @CYB3RMX (:

thank you for your comprehensive testing, good finding, I'll look into it right away!

luis261 commented 3 months ago

@CYB3RMX

ok, so at first glance, I don't think I understand the installation process.

you copy the qu1cksc0pe.py file to /usr/bin/qu1cksc0pe here:

https://github.com/CYB3RMX/Qu1cksc0pe/blob/e11dfe4004b5c692fd1f09d99aca4e1c3814726e/Modules/installer.sh#L59

but the Modules folder itself is never moved/copied right? So how can execution from the entrypoint in usr/bin ever succeed when modules are needed for carrying out analysis?

luis261 commented 3 months ago

Oh ok I understand now, it works via a config file:

https://github.com/CYB3RMX/Qu1cksc0pe/blob/e11dfe4004b5c692fd1f09d99aca4e1c3814726e/qu1cksc0pe.py#L78

That makes things a lot more complicated import-wise. I'll have to see if I can work something out this evening. But I don't think I can make it work without also copying the Modules directory over to /usr/bin. We'll see I guess, I might have some questions for you later.

CYB3RMX commented 3 months ago

Okay man :) Feel free to ask

luis261 commented 3 months ago

Thank you mate, I'll try to figure sth out once I'm off the clock and then present you with a list of options tonight/tomorrow (:

luis261 commented 3 months ago

@CYB3RMX

ok, so here's my current understanding of the install mechanism (Linux only, not taking Docker into account):

initially, the project obviously just sits at whatever location in the local filesystem which the installing user chose to clone it to
the user is expected to run setup.sh for proper setup
- this bash script installs various requirements; Jadx among other things, which is downloaded to /home/$USER/sc0pe_Base/jadx in particular
when executing the cloned qu1cksc0pe.py file with the --install switch, the installer.sh script under ./Modules/ is executed
- qu1cksc0pe.py passes the location of the cloned repo to the bash script
- the installer script creates a config file /etc/qu1cksc0pe.conf, containing a value that references a directory of location /opt/Qu1cksc0pe
- using the passed path, the cloned files are then copied to /opt/Qu1cksc0pe
- a copy of the "main" Python entrypoint is finally written to /usr/bin/qu1cksc0pe and marked as executable, owned by the current user

This means for a full Qu1cksc0pe installation running on Linux, the intended entrypoint is /usr/bin/qu1cksc0pe, as seen above in your screenshot. On execution, it looks for the config file under /etc and if present, utilizes the referenced path to execute modules as needed, lying under /opt/Qu1cksc0pe/Modules/. If no such config file is present, this indicates no proper install was carried out. As such, the entrypoint just runs against the current working directory and assumes the modules to execute are located there (this works for running directly in the cloned location or running the copy under /opt/Qu1cksc0pe/).

The problem here is that the entrypoint under /usr/bin/ can't find the Modules package, which sits at a different location(s).

Based on this understanding, here are the options, odered by rising complexity/disruptiveness to the current way of doing things:

since we're currently using "module-unaware" imports inside the Python modules anyway, we could just permanently stick to that way of importing. I'd have to remove the linAnalyzer import though and replace it with an execution via shell again. I'd also have to provide local copies of required utilities for the entrypoint (the analyzer modules can still share functionality via the locally importable module though). This would still be manageable, since the entrypoint is only using utils.err_exit, which could easily be defined inline instead.
- advantages: no adjustments to the current way of doing things required
- disadvantages: we'll be stuck on executing the Python modules in subshells, can never import them from the entrypoint directly; as such the entrypoint itself also won't have access to shared functionality defined in /opt/Qu1cksc0pe/Modules/utils.py
sys path hack: in the entrypoint, I could append the known location of the Modules package under /opt/Qu1cksc0pe/ to sys.path. That way, Python will be able to find the Module package when we try to import it (the current working directory is part of sys.path anyway, so the other cases are taken care off by default anyway).
- advantages: targeted modification at arguably the right location/source of the issue, without affecting the rest of the project and avoiding the need for further workarounds
- disadvantages: I consider this to be an OK solution; however, it's not considered clean from a Python perspective
place a simple bash entrypoint under /usr/bin/, which simply redirects to the actual location under /opt/, or falls back on the clone location
- advantages: pragmatic solution, avoids duplication
- disadvantages: unsure about the details of this, might introduce problems for passing around the file to analyze?
copy all project files to /usr/bin/
- advantages: pragmatic solution, avoids duplicated entrypoints
- disadvantages: idk, not the proper UNIX way of doing things? might not work?

I suggest we go with option 2, but let me know what you think please. Then I can implement the chosen option (:

luis261 commented 3 months ago

@CYB3RMX

actually I took the liberty of just implementing the 2nd option already, that way you can see what I mean and if you're ok with the solution we can finally get this done (:

CYB3RMX commented 3 months ago

Exactly. The second option seems more logical. By the way, while adding updates, make sure to consider the --install part as well, my friend :)

luis261 commented 3 months ago

Awesome, thanks for the swift feedback 🤗

yes, I'll keep this projects ways of performing installation in mind for future changes. Regarding this specific PR though, we should be fine now when it comes to the impact on the installation process right?

CYB3RMX commented 3 months ago

Absolutely. But our priority should always be ensuring that the program runs smoothly. This way, people using the program won't experience any issues :)

luis261 commented 3 months ago

That's true for sure, but I'm not sure I get what you're trying to say: do you mean just in general or is there something else I need to consider regarding this specific PR before we can hopefully get it merged?

CYB3RMX commented 3 months ago

What I mean is that we can approve the PR once we are sure everything is working correctly.

luis261 commented 3 months ago

That makes sense (:

Do you think you could run it again with the --install option for me, like you did above? Does not have to be now/today if you don't have time.

CYB3RMX commented 3 months ago

Alright. Send your changes so we can test them. If there are no problems, we'll complete the PR :)

luis261 commented 3 months ago

Awesome, thank you so much 🙏

I already pushed and committed ba52c65bf9e6a2d1fd2ab56e6457437b57f0267e this morning. That small change already constitutes the implementation of option 2, thus hopefully solving our problem (:

CYB3RMX commented 3 months ago

Hello @luis261 !

I checked your code, and it seems to be working properly (LinuxAnalyzer module). I also need to check the other modules.

Note: I checked the --console mode. When we try to analyze a simple sample, it throws this error:

That one is for the Windows side:

I think we should add a default value for the report section

luis261 commented 3 months ago

@CYB3RMX thank you so much for your effort and attention to detail! Good to know the current status, I'm glad to hear that at least the actual analysis internals of the linux analyzer are stable again (:

As for the report issue when executing the module directly: yes, that's on me, careless mistake to access that index without a prior check, here's the situation:

there already is a default value for the actual functionality in the analyzer class' analyze method signature
it's just failing right now because I'm accessing in an unsafe manner, thus causing the IndexError
I've just pushed an update that includes a length check, this should fix it (780bbe3)


def analyze(self, indicators_by_category, emit_report=False):
    ...

luis261 commented 3 months ago

I'm a bit disappointed (in myself) regarding the integration changes and delay this refactoring is causing. Thank you for your patience though. Going forwards, I'll be able to avoid changes across such a wide breadth of modules to circumvent the need for such involved testing (across different modules, different ways of execution etc).

This means the risk for upcoming changes will be more predictable/of smaller impact or scope. For now, there was no way to address some points of improvement without affecting all modules though.

luis261 commented 3 months ago

I'll look into the issue with the windows analyzer next, unsure what could be the cause there ...

luis261 commented 3 months ago

I just pushed another commit, opting to introduce a new util function which cleanly handles the issue at hand in general, subsequently applying it to the two affected analyzers.

However, I realized earlier that this bug doesn't seem related to my changes. Previous to my last commit (d52fa20) I didn't touch winAnalyzer.py on any lines which reference sys.argv.

I did introduce a related error by accident for the linux analyzer, assuming the argv value to be of type bool when it really is "True" as a str (also fixed since https://github.com/CYB3RMX/Qu1cksc0pe/pull/58/commits/780bbe3b8771b7a50c13e547aef29a156d56d9f6#). However, the same applies: sys.argv at index 2 was already accessed in this manner prior to my changes: as seen on the current main branch: https://github.com/CYB3RMX/Qu1cksc0pe/blob/master/Modules/linAnalyzer.py#L489

So, this made me look into unsafe argv indexing in general:

[*MASKED*]/Qu1cksc0pe>git grep "sys.argv\["
Modules/VTwrapper.py:    apikey = str(sys.argv[1])
Modules/VTwrapper.py:    targetFile = sys.argv[2]
Modules/andro_familydetect.py:targetApk = sys.argv[1]
Modules/android_dynamic_analyzer.py:androdyn = AndroidDynamicAnalyzer(target_file=str(sys.argv[1]))
Modules/apkAnalyzer.py:targetAPK = sys.argv[1]
Modules/apkAnalyzer.py:        if sys.argv[3] == "JAR":
Modules/apkAnalyzer.py:        if sys.argv[3] == "DEX":
Modules/apkAnalyzer.py:        if sys.argv[2] == "True":
Modules/apple_analyzer.py:target_file = sys.argv[1]
Modules/archiveAnalyzer.py:targetFile = sys.argv[1]
Modules/document_analyzer.py:targetFile = sys.argv[1]
Modules/domainCatcher.py:target_file = sys.argv[1]
Modules/email_analyzer.py:target_eml = sys.argv[1]
Modules/hashScanner.py:    if str(sys.argv[1]) == '--db_update':
Modules/hashScanner.py:        targetFile = sys.argv[1]
Modules/hashScanner.py:    if str(sys.argv[2]) == '--normal':
Modules/languageDetect.py:fileName = str(sys.argv[1])
Modules/linAnalyzer.py:        sys.argv[1], emit_reports=get_argv(2) == "True")
Modules/mitre.py:fileName = sys.argv[1]
Modules/packerAnalyzer.py:targetFile = sys.argv[2]
Modules/packerAnalyzer.py:    if str(sys.argv[1]) == '--single':
Modules/packerAnalyzer.py:    elif str(sys.argv[1]) == '--multiscan':
Modules/pcap_analyzer.py:target_pcap = sys.argv[1]
Modules/powershell_analyzer.py:target_pwsh = sys.argv[1]
Modules/resourceChecker.py:targFile = sys.argv[1]
Modules/sigChecker.py:target_file = sys.argv[1]
Modules/utils.py:        return sys.argv[int(idx)]
Modules/winAnalyzer.py:fileName = sys.argv[1]
Modules/windows_dynamic_analyzer.py:target_pid = int(sys.argv[1])

These locations might also warrant fixing (unrelated to my refactoring). Do you think I should apply the new util to all locations where it's not index 1? (we don't have to concern ourselves with index 1 IMO, since for index 1 it might be acceptable to fail since that's the target file, which is expected to be present)

It might not actually present a problem in many of the cases above, at least not if execution happens via the project entrypoint instead of the user directly executing an analyzer on the console with a missing argument.

For example, apkAnalyzer seems OK, since it's always passed the expected input in the entrypoint:

Qu1cksc0pe>git grep apkAnalyzer
Modules/console.py:                            command = f"{py_binary} {sc0pe_path}{path_seperator}Modules{path_seperator}apkAnalyzer.py \"{filename}\""
qu1cksc0pe.py:            execute_module(f"apkAnalyzer.py \"{analyzeFile}\" False JAR")
qu1cksc0pe.py:            execute_module(f"apkAnalyzer.py \"{analyzeFile}\" False DEX")
qu1cksc0pe.py:                execute_module(f"apkAnalyzer.py \"{analyzeFile}\" True APK")
qu1cksc0pe.py:                execute_module(f"apkAnalyzer.py \"{analyzeFile}\" False APK")

luis261 commented 3 months ago

I can look further into this tonight

CYB3RMX / Qu1cksc0pe