enarx-archive / sallyport

API for the hypervisor-microkernel boundary
Apache License 2.0
7 stars 6 forks source link

feat: implement nanosleep syscall #90

Closed greyspectrum closed 2 years ago

greyspectrum commented 2 years ago

To close enarx/sallyport#55

greyspectrum commented 2 years ago

Thanks Roman for these helpful comments! I still have a few questions, unfortunately.

Sorry I have so many questions!

rvolosatovs commented 2 years ago

Sorry I have so many questions!

No problem, the documentation for this crate is severely lacking and it's great to see that you managed to go ahead and start with this!

Which previously implemented syscall is closest to nanosleep, to use as an example?

Perhaps https://github.com/enarx/sallyport/blob/ad77fa18478a7322a3eb3ed845ee4817735d7786/src/v0.3/src/guest/call/syscall/clock_gettime.rs#L10-L40

What is the difference between the Alloc trait and the PassthroughAlloc trait? Why use Alloc for nanosleep?

Alloc is the most generic trait, which is used for syscalls, which take pointer arguments - the data pointed to by these arguments should be copied into the block and an offset referring to the start of that copied region into the block should be passed to the host from the guest (because we cannot expose the guest memory to the host). The host then computes a pointer from that offset, that's what deref* utilities are for

PassthroughAlloc is a specialized trait for syscalls that do not require this copying process and just pass arguments through directly

Does the use of Alloc mean this syscall is proxied out to host, i.e. is the data from timespec params copied to untrusted memory block, with the syscall called on the host side?

Everything implementing *Alloc traits is allocated into the block and passed to the host

Is there a way to tell generally which syscalls should be proxied vs. handled internally?

Most of syscalls should be proxied, unless the guest can handle it directly or if we provide a stub. Most of the syscalls that are not proxied are stubs. Memory-related syscalls are an example of syscalls handled directly within the guest (because we cannot expose the guest memory to the host anyway)

What kinds of checks should be done on data returned from untrusted memory? For nanosleep, provided that it does need to be proxied to the host, I would imagine that a basic check would be to ensure that *rem is less than *req, since the former is a remainder of the latter. However, since some of the implemented syscalls call out to untrusted C, is bounds checking necessary here? For example, here, I see that you're calling out to libc::SYS_clock_gettime. However, I'm not sure whether all or part of that syscall is executed in untrusted memory; does clock_gettime return to C code inside the Keep? I also see that you're using unsafe here, so presumably I may want to employ bounds checking on the syscall return values even if untrusted C code is returning to Rust code running in the Keep?

Since the data is static-size I don't think we really need to validate anything apart from the fact that rem is less than original req (great catch, by the way) We generally cannot use libc functionality in this crate, since we essentially are providing a custom libc here. The reference you see is actually an imported constant, we do want to use those.

npmccallum commented 2 years ago

@greyspectrum It looks like this needs to be rebased.

greyspectrum commented 2 years ago

@rvolosatovs Why are there two timespecs here, inside Output<>, since there is only one timespec in the clock_gettime buffer? Is one the value of the timespec and the other a pointer to the timespec? https://github.com/enarx/sallyport/blob/6f0925ae3af419aa4cecfb17fdb8bb3a12a3f591/src/guest/call/syscall/clock_gettime.rs#L21

rvolosatovs commented 2 years ago

@rvolosatovs Why are there two timespecs here, inside Output<>, since there is only one timespec in the clock_gettime buffer?

One is the the type of the reference/the type allocated in the block and the other is the type of the associated value, to which the data is going to be copied from the block

greyspectrum commented 2 years ago

This still needs a test. Any chance I can break that out into a separate PR?

rvolosatovs commented 2 years ago

@greyspectrum how about we do something like this: Added some missing parts and a simple test, fixed the syscall number and rem argument typing:

diff --git a/src/guest/call/syscall/nanosleep.rs b/src/guest/call/syscall/nanosleep.rs
index 3f40b93e..2af574b8 100644
--- a/src/guest/call/syscall/nanosleep.rs
+++ b/src/guest/call/syscall/nanosleep.rs
@@ -3,22 +3,22 @@
 use super::super::types::Argv;
 use super::Alloc;
 use crate::guest::alloc::{Allocator, Collect, Collector, Commit, Committer, InOut, Input, Output};
-use crate::Result;
+use crate::{Result, NULL};

 use libc::{c_long, timespec};

 pub struct Nanosleep<'a> {
     pub req: &'a timespec,
-    pub rem: &'a mut timespec,
+    pub rem: Option<&'a mut timespec>,
 }

 pub struct StagedNanosleep<'a> {
     req: Input<'a, timespec, &'a timespec>,
-    rem: InOut<'a, timespec, &'a mut timespec>,
+    rem: Option<InOut<'a, timespec, &'a mut timespec>>,
 }

 impl<'a> Commit for StagedNanosleep<'a> {
-    type Item = Output<'a, timespec, &'a mut timespec>;
+    type Item = Option<Output<'a, timespec, &'a mut timespec>>;

     fn commit(self, com: &impl Committer) -> Self::Item {
         self.req.commit(com);
@@ -27,23 +27,25 @@ impl<'a> Commit for StagedNanosleep<'a> {
 }

 unsafe impl<'a> Alloc<'a> for Nanosleep<'a> {
-    const NUM: c_long = libc::SYS_clock_nanosleep;
+    const NUM: c_long = libc::SYS_nanosleep;

     type Argv = Argv<2>;
     type Ret = ();

     type Staged = StagedNanosleep<'a>;
-    type Committed = Output<'a, timespec, &'a mut timespec>;
+    type Committed = Option<Output<'a, timespec, &'a mut timespec>>;
     type Collected = Result<()>;

     fn stage(self, alloc: &mut impl Allocator) -> Result<(Self::Argv, Self::Staged)> {
         let req = Input::stage(alloc, self.req)?;
-        let rem = InOut::stage(alloc, self.rem)?;
-
-        Ok((
-            Argv([req.offset(), rem.offset()]),
-            Self::Staged { req, rem },
-        ))
+        let (rem, rem_offset) = if let Some(rem) = self.rem {
+            let rem = InOut::stage(alloc, rem)?;
+            let rem_offset = rem.offset();
+            (Some(rem), rem_offset)
+        } else {
+            (None, NULL)
+        };
+        Ok((Argv([req.offset(), rem_offset]), Self::Staged { req, rem }))
     }

     fn collect(
diff --git a/src/guest/handler.rs b/src/guest/handler.rs
index aa204203..565e1303 100644
--- a/src/guest/handler.rs
+++ b/src/guest/handler.rs
@@ -318,7 +318,7 @@ pub trait Handler: Platform {
     fn munmap(&mut self, addr: NonNull<c_void>, length: size_t) -> Result<()>;

     /// Executes [`nanosleep`](https://man7.org/linux/man-pages/man2/nanosleep.2.html) syscall akin to [`libc::nanosleep`].
-    fn nanosleep(&mut self, req: &timespec, rem: &mut timespec) -> Result<()> {
+    fn nanosleep(&mut self, req: &timespec, rem: Option<&mut timespec>) -> Result<()> {
         self.execute(syscall::Nanosleep { req, rem })?
     }

@@ -661,6 +661,15 @@ pub trait Handler: Platform {
                 let addr = NonNull::new(addr as _).ok_or(EFAULT)?;
                 self.munmap(addr, length).map(|_| [0, 0])
             }
+            (libc::SYS_nanosleep, [req, rem, ..]) => {
+                let req = self.validate(req)?;
+                let rem = if rem == 0 {
+                    None
+                } else {
+                    self.validate_mut(rem).map(Some)?
+                };
+                self.nanosleep(req, rem).map(|_| [0, 0])
+            }
             (libc::SYS_poll, [fds, nfds, timeout, ..]) => {
                 let fds = self.validate_slice_mut(fds, nfds)?;
                 self.poll(fds, timeout as _).map(|ret| [ret as _, 0])
diff --git a/src/host/syscall.rs b/src/host/syscall.rs
index a145b2a1..a4acc5dd 100644
--- a/src/host/syscall.rs
+++ b/src/host/syscall.rs
@@ -426,11 +426,15 @@ pub(super) unsafe fn execute(call: &mut item::Syscall, data: &mut [u8]) -> Resul

         item::Syscall {
             num,
-            argv: [req, rem_offset, ..],
+            argv: [req_offset, rem_offset, ..],
             ret: [ret, ..],
         } if *num == libc::SYS_nanosleep as _ => {
-            let rem = deref_aligned::<timespec>(data, *rem_offset, 1)?;
-            let req = deref_aligned::<timespec>(data, *req, 1)?;
+            let req = deref_aligned::<timespec>(data, *req_offset, 1)?;
+            let rem = if *rem_offset == NULL {
+                null_mut()
+            } else {
+                deref_aligned::<timespec>(data, *rem_offset, 1)?
+            };
             Syscall {
                 num: libc::SYS_nanosleep,
                 argv: [req as _, rem as _],
diff --git a/tests/integration_tests/syscall.rs b/tests/integration_tests/syscall.rs
index 691a9644..07a879de 100644
--- a/tests/integration_tests/syscall.rs
+++ b/tests/integration_tests/syscall.rs
@@ -3,11 +3,12 @@
 use super::{run_test, write_tcp};

 use libc::{
-    c_int, iovec, sockaddr, SYS_accept, SYS_accept4, SYS_bind, SYS_close, SYS_fcntl, SYS_fstat,
-    SYS_getrandom, SYS_getsockname, SYS_listen, SYS_read, SYS_readv, SYS_recvfrom, SYS_setsockopt,
-    SYS_socket, SYS_write, SYS_writev, AF_INET, EBADF, EBADFD, ENOSYS, F_GETFD, F_GETFL, F_SETFD,
-    F_SETFL, GRND_RANDOM, O_CREAT, O_RDONLY, O_RDWR, SOCK_CLOEXEC, SOCK_STREAM, SOL_SOCKET,
-    SO_REUSEADDR, STDERR_FILENO, STDIN_FILENO, STDOUT_FILENO,
+    c_int, iovec, sockaddr, timespec, SYS_accept, SYS_accept4, SYS_bind, SYS_close, SYS_fcntl,
+    SYS_fstat, SYS_getrandom, SYS_getsockname, SYS_listen, SYS_nanosleep, SYS_read, SYS_readv,
+    SYS_recvfrom, SYS_setsockopt, SYS_socket, SYS_write, SYS_writev, AF_INET, EBADF, EBADFD,
+    ENOSYS, F_GETFD, F_GETFL, F_SETFD, F_SETFL, GRND_RANDOM, O_CREAT, O_RDONLY, O_RDWR,
+    SOCK_CLOEXEC, SOCK_STREAM, SOL_SOCKET, SO_REUSEADDR, STDERR_FILENO, STDIN_FILENO,
+    STDOUT_FILENO,
 };
 use std::env::temp_dir;
 use std::ffi::CString;
@@ -16,6 +17,7 @@ use std::io::{Read, Seek, Write};
 use std::mem::size_of;
 use std::net::{TcpListener, UdpSocket};
 use std::os::unix::prelude::AsRawFd;
+use std::ptr::null_mut;
 use std::slice;
 use std::{mem, thread};

@@ -225,6 +227,79 @@ fn getrandom() {
     });
 }

+#[test]
+#[serial]
+fn nanosleep() {
+    run_test(2, [0xff; 32], move |i, handler| {
+        let req = timespec {
+            tv_sec: 0,
+            tv_nsec: 1,
+        };
+        if i % 2 == 0 {
+            assert_eq!(
+                handler.nanosleep(&req, None),
+                if cfg!(not(miri)) { Ok(()) } else { Err(ENOSYS) }
+            );
+        } else {
+            assert_eq!(
+                unsafe {
+                    handler.syscall([
+                        SYS_nanosleep as _,
+                        &req as *const _ as _,
+                        null_mut() as *mut timespec as _,
+                        0,
+                        0,
+                        0,
+                        0,
+                    ])
+                },
+                if cfg!(not(miri)) {
+                    Ok([0, 0])
+                } else {
+                    Err(ENOSYS)
+                }
+            );
+        }
+
+        let mut rem = timespec {
+            tv_sec: 0,
+            tv_nsec: 0,
+        };
+        if i % 2 == 0 {
+            assert_eq!(
+                handler.nanosleep(&req, Some(&mut rem)),
+                if cfg!(not(miri)) { Ok(()) } else { Err(ENOSYS) }
+            );
+        } else {
+            assert_eq!(
+                unsafe {
+                    handler.syscall([
+                        SYS_nanosleep as _,
+                        &req as *const _ as _,
+                        &mut rem as *mut _ as _,
+                        0,
+                        0,
+                        0,
+                        0,
+                    ])
+                },
+                if cfg!(not(miri)) {
+                    Ok([0, 0])
+                } else {
+                    Err(ENOSYS)
+                }
+            );
+        }
+        assert_eq!(
+            rem,
+            timespec {
+                tv_sec: 0,
+                tv_nsec: 0,
+            }
+        )
+    });
+}
+
 #[test]
 #[serial]
 #[cfg_attr(miri, ignore)]
greyspectrum commented 2 years ago

@rvolosatovs Thanks for your help on this PR. I applied applied your patch, mind taking one last look at this?

rvolosatovs commented 2 years ago

We should also document the syscall struct (especially due to "unusual" rem argument handling) or leave that for enarx/enarx#1725

greyspectrum commented 2 years ago

We should also document the syscall struct (especially due to "unusual" rem argument handling) or leave that for enarx/enarx#1725

Could you give me a more specific idea of what kind of documentation you would like? Would you like an explanation of what the arguments do (as in from the man page) or something else?

rvolosatovs commented 2 years ago

We should also document the syscall struct (especially due to "unusual" rem argument handling) or leave that for enarx/enarx#1725

Could you give me a more specific idea of what kind of documentation you would like? Would you like an explanation of what the arguments do (as in from the man page) or something else?

I was thinking we should document that unlike most other syscall arguments, rem argument is written only on a specific error case, rather than a success. Can be left for enarx/enarx#1725