avadhpatel / marss

PTLsim and QEMU based Computer Architecture Research Simulator
http://www.marss86.org
128 stars 63 forks source link

Android-x86 on Marss - Pipeline deadlocked #35

Open schfan opened 11 years ago

schfan commented 11 years ago

I ran into an issue of pipeline deadlock when I was running my Android-x86 image on Marss. The way to reproduce the error is as follows.

Here is my marss and qemu version info:

$ git clone git://github.com/avadhpatel/marss.git $ cd marss $ git show --summary commit 49fda4a45e5b29c7e05b9e456228a4d016831484 Merge: 6cf2d32 4ce18f7 Author: Brendan Fitzgerald <fitzfitsahero@gmail.com> Date: Tue Aug 20 10:30:37 2013 -0700

Merge pull request #34 from dramninjasUMD/master

Build # of cores string with preprocessor

lines 1-9/9 (END)

$ scons c=1 debug=2 $ ./qemu/qemu-system-x86_64 -version QEMU emulator version 0.14.1, Copyright (c) 2003-2008 Fabrice Bellard

And we can start the simulation:

(1) $ ./qemu/qemu-system-x86_64 -m 4096 -hda ../path-to-disk/android-64.img -usbdevice mouse -usbdevice keyboard

I am using a customized Android-x86 image (You can download it here: https://www.dropbox.com/s/m83kei9zga82c35/android-64.img). Enter the debug mode which is non-graphical (during the booting you need to type "exit" to continue booting), here is the kernel info of this image:

# uname -a Linux (none) 3.0.36-android-x86-eeepc+ #1 SMP PREEMPT Tue Aug 27 21:27:01 EDT 2013 x86_64 GNU/Linux

(2) I want to simulate this command:

# am start -a android.intent.action.Main -n com.android.calculator2/.Calculator

If there is GUI, then after this command, the Calculator app would be launched. Without graphics, nothing will happen. So now we add start_sim before this command and try to simulate it.

Switch to the qemu terminal (Ctrl+Alt+2), and type (qemu) simconfig -machine single_core

Then switch back to the Android terminal (Ctrl+Alt+1), type # cd /data/marss/ # ./start_sim ; am start -a android.intent.action.Main -com.android.calculator2/.Calculator ; ./kill_sim

(I compiled start_sim/kill_sim statically from the source code provided on Marss website. )

(3) The simulation starts: Switching to simulation

And in my original terminal I can see the Completed Cycles scrolling down...

After a while, the simulation gets stuck and my original terminal's output stops on this line: ... Completed 24021000 cycles, 1774788 commits: 54459 Hz, 51483 Completed 24034000 cycles, 1786064 commits: 59305 Hz, 51441 Completed 24045000 cycles, 1797644 commits: 51526 Hz, 54243insns/sec: rip ffffffff81026c57

And then after a while qemu exits: ... Completed 24021000 cycles, 1774788 commits: 54459 Hz, 51483 Completed 24034000 cycles, 1786064 commits: 59305 Hz, 51441 Completed 24045000 cycles, 1797644 commits: 51526 Hz, 54243 qemu-system-x86_64: ptlsim/build/core/ooo-core/ooo.cpp:929: bool ooo::OooCore::runcycle(void*): Assertion0' failed. Aborted`

If we look at the code ooo.cpp:929, we can see that the issue is still caused by "the pipeline could be deadlocked" but this information was not printed out to the terminal.

dramninjasUMD commented 11 years ago

Just out of curiosity, have you tried running other, simpler binaries in this disk image? Maybe something like ls?

schfan commented 11 years ago

Yes, running simple things like ls is okay.

Thanks! SF -----Original Message----- From: dramninjasUMD notifications@github.com Date: Sat, 07 Sep 2013 13:01:25 To: avadhpatel/marssmarss@noreply.github.com Reply-To: avadhpatel/marss reply@reply.github.com Cc: schfansfan.nju@gmail.com Subject: Re: [marss] Android-x86 on Marss - Pipeline deadlocked (#35)

Just out of curiosity, have you tried running other, simpler binaries in this disk image? Maybe something like ls?


Reply to this email directly or view it on GitHub: https://github.com/avadhpatel/marss/issues/35#issuecomment-24008558

tj90241 commented 11 years ago

Image doesn't work on the any of my repositories (tried anywhere from qemu 0.14 to bleeding edge).

After SeaBIOS initializes, the following message appears:

Booting from Hard Disk... Error 16

fitzfitsahero commented 11 years ago

I got it to boot on the master branch. I'll spend some time looking at it.

schfan commented 11 years ago

Thanks for your help!

By the way I have tried checking /proc/kallsyms but there wasn't any kernel symbol that has an address corresponding to the virtual address that is shown repetitively in the log file.

On Mon, Sep 9, 2013 at 11:00 AM, Brendan Fitzgerald < notifications@github.com> wrote:

I got it to boot on the master branch. I'll spend some time looking at it.

— Reply to this email directly or view it on GitHubhttps://github.com/avadhpatel/marss/issues/35#issuecomment-24083369 .

schfan commented 11 years ago

@tj90241 I noticed that the image might be corrupted during the downloading, which will lead to the "Booting from Hard Disk..." Error. If that happens, please download it again! Thanks!

tj90241 commented 11 years ago

Redownloaded; it was a corrupted image, thanks. I'll look into it this weekend.

schfan commented 11 years ago

@tj90241 Thanks Tyler!

I also noticed that qemu 1.2 supports network while qemu 0.14 doesn't, in the case of Android-x86. But I guess it doesn't matter for now.

schfan commented 11 years ago

PS: If any of you are interested in building your own Android-x86 image, here is how to do that: http://www.cs.duke.edu/~schfan/blog/blog/2013/09/13/making-an-android-x86-image-for-marss/ . Thanks!

tj90241 commented 11 years ago

Found the issue after looking quickly -- MARSS doesn't handle SMC properly. I'm surprised this bug hasn't arisen before now, but it makes sense that it's causing Java to tie up immediately as Java makes excessive use of SMC. Fortunately, it's not related to your image or anything -- thanks for the bug report.

schfan commented 11 years ago

Hi Tyler,

It is great news! Thanks so much for your help!

Could you tell me how you found out this issue? I am learning methods of debugging in Marss. Also, since the issue is found, are there ways to fix it? I'd like to help!

Thanks again!

On Fri, Sep 13, 2013 at 1:31 PM, Tyler Stachecki notifications@github.comwrote:

Found the issue after looking quickly -- MARSS doesn't handle SMC properly. I'm surprised this bug hasn't arisen before now, but it makes sense that it's causing Java to tie up immediately as Java makes excessive use of SMC. Fortunately, it's not related to your image or anything -- thanks for the bug report.

— Reply to this email directly or view it on GitHubhttps://github.com/avadhpatel/marss/issues/35#issuecomment-24410914 .

tj90241 commented 11 years ago

I honestly guess most of it was just intuition. MARSS simulates almost everything perfectly -- as I said before, I have never seen single_core deadlock in ages! Given that knowledge, and that it is widely know that the JVM uses SMC, I then looked at the simulator and lo and behold, it was fairly evident that SMC is not being handled correctly (there are even some unimplemented functions lying around...).

schfan commented 11 years ago

Thanks for finding out the issue! Please excuse my little knowledge in this area, but do you mean Self Modifying Code when you say SMC? If possible, could you please say more about the unimplemented functions you found?

I read the PTLSim manual (version 2007) and it mentioned how SMC is supported (page 31). But what exactly is causing the problem we have? Is it because Marss' "design eliminates forced invalidations when the kernel frees up a page containing code that's immediately overwritten with normal user data"?

I am just wondering what would be the best way to solve/work-around this issue, because running Android-x86 applications is crucial for my current research project. Although I can try looking for the specific functions in JVM and modify them to prevent Marss from crashing, it would be more convincing not to modify the guest OS. Do you think it's possible to fix the SMC related problem in Marss? If so, how long do you think it will take? If you can point out the necessary steps, I'd like to try working on it.

Many thanks!

tj90241 commented 11 years ago

Yes, I do mean self-modifying code when I abbreviate with SMC.

I did spend some time looking at it this weekend, but unfortunately the bug hasn't been as simple to repair as I had hoped. I have gotten to simulation to proceed further, but the guest either segfaults while running code that is self-modifying in simulation mode, or the pipeline just deadlocks (albeit at a later point in time than it did before the fixes).

The unimplemented function related to SMC is here: https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L984

It's also very confusing in some cases as to which SMC function is being called in many cases! See: https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L939 https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L1779 (one function accepts a physical address, and another accepts a virtual address).

I'm also not certain that all of these functions ever get called, either...

I have also noticed that the mfnlo and mfnhi variables of the RIPVirtPhys class from PTLsim are always set to zero and not the same way they are in PTLsim? These variables are often used by the simulator in parts of code that check and handle SMC, so I tried to fix that part of the problem. I can send you a patch of what I currently have offline if you e-mail me directly.

AFAIK, SMC did work in PTLsim; sometime after it was merged with MARSS it broke is my guess (?) (it could be that the bug was also in the original PTLsim and wasn't fixed when it got merged with MARSS).

Unfortunately, I'm not sure that there is a way around the bug; that is to say, I'm not certain whether or not you can simply modify the JVM to skirt around this issue. It's certainly possible to fix it, it's just going to be a difficult bug to properly track and solve in my mind. My next goal was to see if I could write a very small piece of SMC and try to reproduce the issue so that the log is more manageable size to read and the problem is easier to debug, but I ran out of time this weekend.

schfan commented 11 years ago

Thanks so much for your help!

After seeing your comments, I first thought it was due to Android's JIT (just-in-time) execution mode. I turned it off system-wide, but it didn't work. Then I tested if it's related to Java virtual machine (Dalvik) issue and it seems to be the case.

(I need to point out that I tested Java in the Ubuntu disk image on Marss and it was okay.)

I added a Dalvik Executable file in the Android disk image and simply executing it will reproduce the error. I have updated the disk image file, please download it again: https://www.dropbox.com/s/m83kei9zga82c35/android-64.img .

Now if you boot the Android virtual machine, (switch to the qemu terminal and simconfig -machine single_core and then switch back to the Android terminal) type:

# su
# cd /data/marss/
# ./run_java.sh

the simulation will soon terminate because of the same pipeline deadlock issue.

@tj90241 I will email you directly regarding the patch file you have. Thanks!

PS: If you want to write your own java file and execute it in Android, here is how to do it: http://www.cs.duke.edu/~schfan/blog/blog/2013/09/19/executing-dex-file-in-android/.

schfan commented 10 years ago

Hi,

I am just wondering if anyone would still like to work on this issue. Thanks!