Cisco-Talos / pyrebox

Python scriptable Reverse Engineering Sandbox, a Virtual Machine instrumentation and inspection framework based on QEMU
https://talosintelligence.com/pyrebox
GNU General Public License v2.0
1.65k stars 250 forks source link

The data I recorded in the memory callback is abnormal #111

Open Waterman178 opened 4 years ago

Waterman178 commented 4 years ago

image image image image

As shown in the picture above,i modified your code. I inserted my read-write record function in the deliver_callback function of the callbacks.cpp file. I only record the read-write information in the range of 0xffea4cff and 0xffea4d25. When the instruction is fetched, it will also enter the read-write callback, and instructions such as call and ret that will modify the memory will not enter the callback function at all. I want to know how to filter out the "fetch memory" situation, in addition, I need to consider the call, ret modify the memory situation. guest os:win7x64 host os:debian 64bit in vmware

Waterman178 commented 4 years ago

I tested the tracing of kernel instructions yesterday.It's completely abnormal.For example, the push instruction, it will not record.This problem has troubled me for five days, I don't know how to solve it, I hope someone can help

xabiugarte commented 4 years ago

Hi,

I would need to see all your code, or at least, all the code needed to reproduce the problem.

Thanks,

Waterman178 commented 4 years ago

Ok, thanks, I will package the source code and send it to your mailbox later

Waterman178 commented 4 years ago

I patiently debugged for another day, and finally found the cause of the problem, I will look at it again, maybe I can solve this problem.

Waterman178 commented 4 years ago

The problem is basically solved, but when the CPU fetch the code will also enter the read and write callback, tested it with your test script, the same result.As picture above, it will read 0x470 from 0xffea4cff in advance.How should it be filtered out?

Waterman178 commented 4 years ago

Hi,

I would need to see all your code, or at least, all the code needed to reproduce the problem.

Thanks,

I have packaged it to your mailbox, please check

xabiugarte commented 4 years ago

Hi, I have received the code. I will take a look at it as soon as I find some time.

Thanks,

xabiugarte commented 4 years ago

Hi,

I have finally allocated some time to take a look at your code. It seems that you are most likely having issues with how you track processes and threads. In any case, that's is definitely not the way you are supposed to install callbacks, and if you modify the internals of pyrebox in that way I cannot guarantee that it will work properly.

You should be able to track memory read/writes without modifying internals, just by installing regular callbacks and using triggers (there are examples available). Triggers are executed in C/C++ before a python callback is called, so the performance impact should be lower as well.

For instance, you can use as an example the following trigger: https://github.com/Cisco-Talos/pyrebox/blob/master/triggers/trigger_bpw_memrange.cpp

That is configured in the following way:

https://github.com/Cisco-Talos/pyrebox/blob/b5965efff35dd3b72e4f2f7806930d2d868251b6/scripts/mem_write_test.py

You can create your own trigger in the same way, and filter out the memory reads to just those that happen in the context of a given process (pgd) and by instructions in a given range.

Also, for the next time, I would really appreciate if you could create a github repository and push your changes there instead of sending them packaged in a rar file. That would help me to see your changes over the code.

Waterman178 commented 4 years ago

Hi,

I have finally allocated some time to take a look at your code. It seems that you are most likely having issues with how you track processes and threads. In any case, that's is definitely not the way you are supposed to install callbacks, and if you modify the internals of pyrebox in that way I cannot guarantee that it will work properly.

You should be able to track memory read/writes without modifying internals, just by installing regular callbacks and using triggers (there are examples available). Triggers are executed in C/C++ before a python callback is called, so the performance impact should be lower as well.

For instance, you can use as an example the following trigger: https://github.com/Cisco-Talos/pyrebox/blob/master/triggers/trigger_bpw_memrange.cpp

That is configured in the following way:

https://github.com/Cisco-Talos/pyrebox/blob/b5965efff35dd3b72e4f2f7806930d2d868251b6/scripts/mem_write_test.py

You can create your own trigger in the same way, and filter out the memory reads to just those that happen in the context of a given process (pgd) and by instructions in a given range.

Also, for the next time, I would really appreciate if you could create a github repository and push your changes there instead of sending them packaged in a rar file. That would help me to see your changes over the code.

I tried to use triggers and scripts to monitor memory reads and writes, the same result, useless, you did not finish watching the video I sent you.It is a TLB problem. When the CPU reads a certain memory address, it will not enter the callback function next time, unless TLB is completely disabled, but it will be many times slower than it is now.

xabiugarte commented 4 years ago

Hi,

I watched the entire video, but I don't recall any comment regarding the TLB problem you mention. If the TLB is affecting how memory reads/writes are tracked, let's create an issue for that and I'll try to fix it.

For the next issues, I would appreciate if you use a different format for reporting problems. As I mentioned in my previous message, it would be great if you could create a repository where I can see the code and modifications necessary to reproduce the issue. That would really help me debug and understand the issue.

Thanks,

Xabier

Waterman178 commented 4 years ago

I watched the entire video, but I don't recall any comment regarding the TLB problem you mention. If the TLB is affecting how memory reads/writes are tracked, let's create an issue for that and I'll try to fix it.

For the next issues, I would appreciate if you use a different format for reporting problems. As I mentioned in my previous message, it would be great if you could create a repository where I can see the code and modifications necessary to reproduce the issue. That would really help me debug and understand the issue.

Thanks,

This problem does not need to submit code. You only need to use your script or trigger to monitor the read and write information of a call in any process.You will find that some addresses do not exist in the log. But the instructions at those addresses did access memory. I mentioned it in the video。