ESOS-Lab / VSSIM

Virtual Machine based SSD Simulator
62 stars 50 forks source link

guest going into infinite loop at start of boot #17

Open shehbazj opened 6 years ago

shehbazj commented 6 years ago

Hello,

Thank you for creating this Simulator for research purposes. As I was trying to run your simulator, however, I could not boot the qemu-image from the qemu-system-x86_64 executable that is provided in the VSSIM package.

Earlier, I was getting the following error when I ran the run.sh command:

Formatting '../../RAMDISK/rd/ssd_hda.img', fmt=qcow size=21474836480 encryption=off
[FTL_INIT] start
[SSD_MONITOR] SERVER THREAD CREATED!!!
[SSD_MONITOR] Wait for client....[13]
[SSD_MONITOR] Connected![-1]
[SSD_MONITOR] Error No. [4] msg: Interrupted system call
[SSD_IO_INIT] SSD Version: 1.2 ver. (17.11.10)
[FTL_INIT] complete
[FTL_TERM] start
Average Read Latency    0.000 us
Average Write Latency   0.000 us
[FTL_TERM] complete

Besides the already placed print statement "The error No. 4" I also added strerror(errno) which prints "Interrupted System Call" Also, note that "Connected" prints -1 for the client port number. I have changed 9995 to different values between 9991 and 9999 at the two locations that the README suggests. I was still unable to resolve this issue.

Later, I saw that the #define MNT_DEBUG flag starts the client, so I commented that out as well. With that also, I was unable to boot the image, and I get the following output, with monitor screen and with QEMU screen inifinitely trying to reboot with the boot screen.

[FTL_INIT] start [SSD_IO_INIT] SSD Version: 1.2 ver. (17.11.10) [FTL_INIT] complete [FTL_TERM] start Average Read Latency 0.000 us Average Write Latency 0.000 us [FTL_TERM] complete

Finally, I changed guest Image from 16.04 to 14.04, installed the OS on the guest with a separate working version of qemu, and then rebooted the 14.04 image with the run.sh command. I still keep getting the infinite boot screen on 14.04 image.

Please let me know if you would require further information to get to the root cause of the issue? The Guest VM image boots properly with other qemu version, however, with the qemu executable that ./run.sh references (inside QEMU folder), the Guest goes into infinite loop. Please advise.

Attached herewith is the screenshot of the bootscreen that infinitely keeps showing up. Note that the guest image already has a valid ubuntu image.

image

Thanks, Shehbaz

jedisty commented 6 years ago

Dear Shehbaz

MNT_DEBUG flag is used to print the debugging message of SSD Monitor.

Connect[-1] means that building the connection between server (SSD Monitor) and the client (QEMU) is failed.

Here are my questions:

  1. Did you re-compile the QEMU and SSD Monitor after changing the port number?
  2. What Ubuntu image did you use? X64 or i386?

Thank you

Jinsoo Yoo

From: Shehbaz Jaffer notifications@github.com Sent: Friday, April 20, 2018 5:28 AM To: ESOS-Lab/VSSIM VSSIM@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [ESOS-Lab/VSSIM] guest going into infinite loop at start of boot (#17)

Hello,

Thank you for creating this Simulator for research purposes. As I was trying to run your simulator, however, I could not boot the qemu-image from the qemu-system-x86_64 executable that is provided in the VSSIM package.

Earlier, I was getting the following error when I ran the run.sh command:

Formatting '../../RAMDISK/rd/ssd_hda.img', fmt=qcow size=21474836480 encryption=off [FTL_INIT] start [SSD_MONITOR] SERVER THREAD CREATED!!! [SSD_MONITOR] Wait for client....[13] [SSD_MONITOR] Connected![-1] [SSD_MONITOR] Error No. [4] msg: Interrupted system call [SSD_IO_INIT] SSD Version: 1.2 ver. (17.11.10) [FTL_INIT] complete [FTL_TERM] start Average Read Latency 0.000 us Average Write Latency 0.000 us [FTL_TERM] complete

Besides the already placed print statement "The error No. 4" I also added strerror(errno) which prints "Interrupted System Call" Also, note that "Connected" prints -1 for the client port number. I have changed 9995 to different values between 9991 and 9999 at the two locations that the README suggests. I was still unable to resolve this issue.

Later, I saw that the #define MNT_DEBUG flag starts the client, so I commented that out as well. With that also, I was unable to boot the image, and I get the following output, with monitor screen and with QEMU screen inifinitely trying to reboot with the boot screen.

[FTL_INIT] start [SSD_IO_INIT] SSD Version: 1.2 ver. (17.11.10) [FTL_INIT] complete [FTL_TERM] start Average Read Latency 0.000 us Average Write Latency 0.000 us [FTL_TERM] complete

Finally, I changed guest Image from 16.04 to 14.04, installed the OS on the guest with a separate working version of qemu, and then rebooted the 14.04 image with the run.sh command. I still keep getting the infinite boot screen on 14.04 image.

Please let me know if you would require further information to get to the root cause of the issue? The Guest VM image boots properly with other qemu version, however, with the qemu executable that ./run.sh references (inside QEMU folder), the Guest goes into infinite loop. Please advise.

Thanks, Shehbaz

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ESOS-Lab/VSSIM/issues/17 , or mute the thread https://github.com/notifications/unsubscribe-auth/ACK7DWj7Cb2pzwEVPwDgST1ztqu9a6l4ks5tqPNNgaJpZM4Tcck3 .


This e-mail is intended only for the named recipient. Dissemination, distribution, forwarding, or copying of this e-mail by anyone other than the intended recipient is prohibited. If you have received it in error, please notify the sender by e-mail and completely delete it. Thank you for your cooperation.

The phrase above is the same as the Korean contents below.

위 전자우편에 포함된 정보는 지정된 수신인에게만 발송되는 것으로 보안을 유지해야 하는 정보와 법률상 및 기타 사유로 공개가 금지된 정보가 포함돼 있을 수 있습니다. 귀하가 이 전자우편의 지정 수신인이 아니라면 본 메일에 포함된 정보의 전부 또는 일부를 무단으로 보유, 사용하거나 제3자에게 공개, 복사, 전송, 배포해서는 안 됩니다. 본 메일이 잘못 전송되었다면, 전자우편 혹은 전화로 연락해주시고, 메일을 즉시 삭제해 주시기 바랍니다. 협조해 주셔서 감사합니다.

shehbazj commented 6 years ago

Thank you for the quick reply!

  1. I have tried using both x86_64 and i386 ubuntu 14.05 images, I am currently working with i386 image.
  2. Yes, I compiled both MONITOR and QEMU source code.

While going through the ssd_log_manager.c code, I saw that the unsigned int len parameter being passed to accept() call is not initialized. This is a bug. from the man page:

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
The  addrlen  argument  is a value-result argument: the caller must initialize it to contain the size (in
       bytes) of the structure pointed to by addr; on return it  will  contain  the  actual  size  of  the  peer
       address.

However, even after making the len = sizeof(clientAddr)

I continue getting the -1 error for accept. I have added the following patch, where the client continues to request for accept connection after accept fails to connect if the error is retriable error https://stackoverflow.com/questions/28098563/errno-after-accept-in-linux-socket-programming:

@@ -73,9 +74,9 @@ void THREAD_SERVER(void* arg)
 #ifdef MNT_DEBUG
        printf("[SSD_MONITOR] SERVER THREAD CREATED!!!\n");
 #endif 
-       unsigned int len;
        struct sockaddr_in serverAddr;
        struct sockaddr_in clientAddr;
+       socklen_t len = sizeof(clientAddr);

        if((servSock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0){
 #ifdef MNT_DEBUG
@@ -110,7 +111,21 @@ void THREAD_SERVER(void* arg)
 #ifdef MNT_DEBUG
        printf("[SSD_MONITOR] Wait for client....[%d]\n", servSock);
 #endif
-       clientSock = accept(servSock, (struct sockaddr*) &clientAddr, &len);
+
+       while (1) {
+               clientSock = accept(servSock, (struct sockaddr*) &clientAddr, &len);
+               if(clientSock < 0) {
+                       if((errno == ENETDOWN || errno == EPROTO || errno == ENOPROTOOPT || errno == EHOSTDOWN ||
+                                               errno == ENONET || errno == EHOSTUNREACH || errno == EOPNOTSUPP || errno == ENETUNREACH)) {
+                               continue;
+                       } else {
+                               printf("Uncontinuable error %d %s\n", errno, strerror(errno));
+                               shutdown(servSock, SHUT_RDWR);
+                               exit (EXIT_FAILURE);
+                       }
+               }
+       }
+
 #ifdef MNT_DEBUG

This code exits without getting connected to the server. I am currently looking at client side (function MonitorForm() that sends connect() request to the accept() function). I see QTcpSocket being used for connection. I placed debug statements here to check if connection takes place correctly here or not. It seems the connection is made successfully here (I tried QtDebug statements to check if socket was not being created). Could you please tell me other places where connect() request is sent for connecting to servSock? In other words, what is SSD_MONITOR accepting connections from? is it QEMU, SSD_MODULE or something else? I can look at how connect request is sent to accept function.

Thank you again, Shehbaz

jedisty commented 6 years ago

Thank you for your advice. Please let me know if the codes fix the problem.

And also, if you want to install 64 bit ISO images, you can use the VSSIM_nvme branch.

VSSIM_nvme branch is developed based on the qemu-2.9 and supports NVMe host interface.

Thank you.

Jinsoo Yoo

From: Shehbaz Jaffer notifications@github.com Sent: Friday, April 20, 2018 11:28 PM To: ESOS-Lab/VSSIM VSSIM@noreply.github.com Cc: JSYoo jedisty@hanyang.ac.kr; Comment comment@noreply.github.com Subject: Re: [ESOS-Lab/VSSIM] guest going into infinite loop at start of boot (#17)

Thank you for the quick reply!

  1. I have tried using both x86_64 and i386 ubuntu 14.05 images, I am currently working with i386 image.
  2. Yes, I compiled both MONITOR and QEMU source code.

While going through the ssd_log_manager.c code, I saw that the unsigned int len parameter being passed to accept() call is not initialized. This is a bug. from the man page:

int accept(int sockfd, struct sockaddr addr, socklen_t addrlen); The addrlen argument is a value-result argument: the caller must initialize it to contain the size (in bytes) of the structure pointed to by addr; on return it will contain the actual size of the peer address.

However, even after making the len = sizeof(clientAddr)

I continue getting the -1 error for accept. I have added the following patch, where the client continues to request for accept connection after accept fails to connect if the error is retriable error https://stackoverflow.com/questions/28098563/errno-after-accept-in-linux-socket-programming:

@@ -73,9 +74,9 @@ void THREAD_SERVER(void* arg)

ifdef MNT_DEBUG

    printf("[SSD_MONITOR] SERVER THREAD CREATED!!!\n");

endif

This code exits without getting connected to the server. For the error code 4 "interrupted system call" it is advised that we change accept() to select() system call, I am currently working on this and will report back if I succeed.

Thanks, Shehbaz

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ESOS-Lab/VSSIM/issues/17#issuecomment-383113740 , or mute the thread https://github.com/notifications/unsubscribe-auth/ACK7DdEc_KnX_FrG6BDdZh8w9N4CNoNiks5tqfBSgaJpZM4Tcck3 .


This e-mail is intended only for the named recipient. Dissemination, distribution, forwarding, or copying of this e-mail by anyone other than the intended recipient is prohibited. If you have received it in error, please notify the sender by e-mail and completely delete it. Thank you for your cooperation.

The phrase above is the same as the Korean contents below.

위 전자우편에 포함된 정보는 지정된 수신인에게만 발송되는 것으로 보안을 유지해야 하는 정보와 법률상 및 기타 사유로 공개가 금지된 정보가 포함돼 있을 수 있습니다. 귀하가 이 전자우편의 지정 수신인이 아니라면 본 메일에 포함된 정보의 전부 또는 일부를 무단으로 보유, 사용하거나 제3자에게 공개, 복사, 전송, 배포해서는 안 됩니다. 본 메일이 잘못 전송되었다면, 전자우편 혹은 전화로 연락해주시고, 메일을 즉시 삭제해 주시기 바랍니다. 협조해 주셔서 감사합니다.