cvut / qtrvsim

RISC-V CPU simulator for education purposes
GNU General Public License v3.0
504 stars 69 forks source link

Add DWARF support to display currently executed C code (ELF file) #123

Open jdupak opened 7 months ago

trdthg commented 5 months ago

I think you are referring to

  1. Loading ELF and C source code from the user
  2. Displaying the C source code in an editor tab
  3. find the mapping between instructions and C source locations from the DWARF information in the ELF file
  4. display it in an proper way, such as

Are there any problems?

jdupak commented 5 months ago

Yes, there will be no C source code. You are loading just ELF and you need to extract the source code information from the debug info in the binary itself. So you will need to find some library that is not too big (LLVM) and is cross platform (including wasm) to read it.

trdthg commented 5 months ago

thanks,then it will be very hard : I

trdthg commented 5 months ago

I'm going to try to do something with this issue, to achieve the same functionality as described above

I'll leave the extracting of the source code as an interface (with options in the menu), and give the following two implementations

may be looks like this:

image

Since the latter is easier, I'll try to implement it first.

If I still have time I might work on it, but of course it can be left to others!


For extract, I tried to find and test some disassemblers (tested on x86), e.g. ida, ghidra. ida disassembles quite well, but it's not open source. ghidra is open source, and it works fine, but it's also quite a large project, not easy to use, and doesn't really have good support for dwarf-5 and riscv? (not going in depth here, just sharing some progress and thoughts)

jdupak commented 5 months ago

I think you might be going in the wrong direction here. I quicky browsed GitHub and this is the kind of library we had in mind: https://github.com/GrandChris/elf_analysis I did not check the library in depth.

Direct loading is not useful, since you need to map the code lines with instructions anyway.

On Thu, 27 Jun 2024, at 21:20, trdthg wrote:

I'm going to try to do something with this issue, to achieve the same functionality as described above

I'll leave the extracting of the source code as an interface (with options in the menu), and give the following two implementations

• extract from ELF • directly load locally Since the latter is easier, I'll try to implement it first.

If I still have time I might work on it, but of course it can be left to others!

For extract, I tried to find and test some disassemblers (tested on x86), e.g. ida, ghidra. ida disassembles quite well, but it's not open source. ghidra is open source, and it works fine, but it's also quite a large project, not easy to use, and doesn't really have good support for dwarf-5 and riscv? (not going in depth here, just sharing some progress and thoughts)

— Reply to this email directly, view it on GitHub https://github.com/cvut/qtrvsim/issues/123#issuecomment-2195504063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNB76E7A65IUHWQDGYRH53ZJRQXPAVCNFSM6AAAAABGN7VNIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGUYDIMBWGM. You are receiving this because you authored the thread.Message ID: @.***>

trdthg commented 5 months ago

I am not saying I will give up reading ELF, it is necessary and included in my plan. I will certainly search for a relevant library to read elf

The solution I said only temporarily simplifies the first step: how to get C code

ppisa commented 2 months ago

I have discussed the goal to use DWARF to map instruction address to source file line with Jan Hubicka at GNU Tools Cauldron and he suggest to look at https://www.nongnu.org/libunwind/

trdthg commented 1 month ago

I did some simple test

I discussed it with my friend and they thinks that "It is impossible to not look back at the C code through decompilation in the case of only ELF"

But there may be a way to build map between variable info(name,type,line_number from dwarf) and it's real value with libunwind, ptrace and dwarf

There is a blog that describes some similar ideas, I haven't put it into practice yet

Some reference materials

And this issue generally looks like it needs to implement a decompiler