cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.12k stars 605 forks source link

Was trying to build and run the distributed file system called SeaWeedFS. #1188

Closed Akilan1999 closed 2 years ago

Akilan1999 commented 2 years ago

I was able to build it successfully. But when running it I ran into the following issue.

syscall(): unimplemented system call 102
syscall(): unimplemented system call 102
syscall(): unimplemented system call 104
syscall(): unimplemented system call 102
syscall(): unimplemented system call 104
syscall(): unimplemented system call 102
syscall(): unimplemented system call 104
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
syscall(): unimplemented system call 263
syscall(): unimplemented system call 263
syscall(): unimplemented system call 266
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol

I would assume it would make sense to open the following issue to provide support for the following Syscall.

nyh commented 2 years ago

System call 102 is getuid, 104 is getgid, those should easy to implement (just add one linux in linux.cc). 263 is unlinkat, 266 is symlinkat. Will take slightly more work because we need to implement symlinkat() (although it should be easy, it's similar to other similar functions).

The netlink thing is a harder problem... It's a Linux-specific interface for inquiring about routing another other stuff, that we never implemented. It's not impossible to implement, but the first thing I would check is whether this SeaWeedFS can gracefully recover from failing to use it. If it can continue to work without netlink, I wouldn't rush to implement it.

wkozaczuk commented 2 years ago

We actually have at least partial netlink support on ipv6 branch but given how rich the netlink interface is it is hard to know if it will be enough. In ipv6 branch netlink is used to implement getifaddr() and if_nameindex() (see https://github.com/cloudius-systems/osv/commit/b687b7c4a938e62b8bced6bdace8dcbde463a37e).

@Akilan1999 would you mind sending a patch to create a simple app to demo running SeaWeedFS on OSv (please see other apps under https://github.com/cloudius-systems/osv-apps for an example).

Akilan1999 commented 2 years ago

sure

wkozaczuk commented 2 years ago

I finally found a bit of time to work on it and I have managed to get SeaweedFS running on OSv:

./scripts/run.py -e '/weed master -port 9333' --forward 'tcp::9333-:9333'
OSv v0.56.0-96-g45990a64
eth0: 192.168.122.15
Booted up in 278.20 ms
Cmdline: /weed master -port 9333
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 file_util.go:23] Folder /tmp Permission: -rwxrwxr-x
I0516 03:09:29     2 master.go:232] current: :9333 peers:
I0516 03:09:29     2 master_server.go:122] Volume Size Limit is 30000 MB
I0516 03:09:29     2 master.go:143] Start Seaweed Master 30GB 2.96  at :9333
I0516 03:09:29     2 raft_server.go:80] Starting RaftServer with :9333
I0516 03:09:29     2 raft_server.go:129] current cluster leader: 
I0516 03:09:47     2 master.go:176] Start Seaweed Master 30GB 2.96  grpc server at :19333
I0516 03:09:48     2 masterclient.go:80] No existing leader found!
I0516 03:09:48     2 raft_server.go:146] Initializing new cluster
I0516 03:09:48     2 master_server.go:165] leader change event:  => :9333
I0516 03:09:48     2 master_server.go:168] [ :9333 ] :9333 becomes leader.
I0516 03:09:52     2 master_grpc_server.go:278] + client master@:9333
curl http://localhost:9333/cluster/status?pretty=y
{
  "IsLeader": true,
  "Leader": ":9333"
}
./scripts/run.py -e '/weed server -dir=/tmp' --forward 'tcp::9333-:9333'
OSv v0.56.0-96-g45990a64
eth0: 192.168.122.15
Booted up in 282.69 ms
Cmdline: /weed server -dir=/tmp
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 master.go:232] current: :9333 peers:
I0516 03:18:42     2 file_util.go:23] Folder /tmp Permission: -rwxrwxr-x
I0516 03:18:42     2 file_util.go:23] Folder /tmp Permission: -rwxrwxr-x
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 volume.go:195] detected volume server ip address: 
I0516 03:18:42     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:42     2 master.go:232] current: :9333 peers::9333
I0516 03:18:42     2 master_server.go:122] Volume Size Limit is 30000 MB
I0516 03:18:42     2 master.go:143] Start Seaweed Master 30GB 2.96  at :9333
I0516 03:18:42     2 raft_server.go:80] Starting RaftServer with :9333
I0516 03:18:42     2 raft_server.go:129] current cluster leader: 
I0516 03:18:44     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:45     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:47     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:49     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:51     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:53     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:54     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:56     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:58     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:19:00     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:19:00     2 master.go:176] Start Seaweed Master 30GB 2.96  grpc server at :19333
I0516 03:19:01     2 masterclient.go:80] No existing leader found!
I0516 03:19:01     2 raft_server.go:146] Initializing new cluster
I0516 03:19:01     2 master_server.go:165] leader change event:  => :9333
I0516 03:19:01     2 master_server.go:168] [ :9333 ] :9333 becomes leader.
syscall(): unimplemented system call 137
I0516 03:19:02     2 disk_location.go:396] dir /tmp disk free 0.00% < required 1.00%
I0516 03:19:02     2 disk_location.go:182] Store started on dir: /tmp with 0 volumes max 8
I0516 03:19:02     2 disk_location.go:185] Store started on dir: /tmp with 0 ec shards
I0516 03:19:02     2 volume_grpc_client_to_master.go:50] Volume server start with seed master nodes: [:9333]
I0516 03:19:02     2 volume.go:364] Start Seaweed volume server 30GB 2.96  at :8080
I0516 03:19:02     2 volume_grpc_client_to_master.go:107] Heartbeat to: :9333
I0516 03:19:02     2 node.go:222] topo adds child DefaultDataCenter
I0516 03:19:02     2 node.go:222] topo:DefaultDataCenter adds child DefaultRack
I0516 03:19:02     2 node.go:222] topo:DefaultDataCenter:DefaultRack adds child :8080
I0516 03:19:02     2 node.go:222] topo:DefaultDataCenter:DefaultRack::8080 adds child 
I0516 03:19:02     2 master_grpc_server.go:72] added volume server 0: :8080
I0516 03:19:05     2 master_grpc_server.go:278] + client master@:9333
curl http://localhost:9333/dir/status?pretty=y
{
  "Topology": {
    "DataCenters": [
      {
        "Id": "DefaultDataCenter",
        "Racks": [
          {
            "DataNodes": [
              {
                "EcShards": 0,
                "Max": 8,
                "PublicUrl": ":8080",
                "Url": ":8080",
                "VolumeIds": " ",
                "Volumes": 0
              }
            ],
            "Id": "DefaultRack"
          }
        ]
      }
    ],
    "Free": 8,
    "Layouts": null,
    "Max": 8
  },
  "Version": "30GB 2.96 "
}

I do not know SeaweedFS much so I could not really tell how well it works. But it seems to respond to some curl calls.

The netlink support does not seem to be critical. From what I could tell it is used by golang to detect network interfaces (see https://go.dev/src/syscall/netlink_linux.go) but it seems to fall back to another mechanism or assumes some defaults. In any case, I am still planning to port the netlink implementation from the ipv6 branch. My initial experiments seem to confirm that it would be enough to satisfy the needs of golang network interfaces discovery logic.

I should be sending some patches soon - mostly to add a number of syscalls:

+    SYSCALL0(getgid);
+    SYSCALL0(getuid);
+    SYSCALL2(getcwd_syscall, char *, size_t);
+    SYSCALL3(unlinkat, int , const char *, int);
+    SYSCALL3(symlinkat, const char *, int, const char *);
+    SYSCALL3(getdents64, int, void *, size_t);
+    SYSCALL4(renameat, int, const char *, int, const char *);
+    SYSCALL3(lseek, int, off_t, int);
Akilan1999 commented 2 years ago

Wow amazing ! Thanks a lot for the patch.

wkozaczuk commented 2 years ago

@Akilan1999 I have recently fixed an important bug that makes running SeaweedFS much better on OSv (please see https://github.com/cloudius-systems/osv/commit/a0251df210a6739830bd37e52211268f5672a27c). In essence, it allows SeaweedFS to bind to an individual interface and talk to itself at the same time. That is why the older examples had to use the --ip 0.0.0.0 which made it impossible to test any practical examples.

I have also run some experiments and benchmarks. I have described pretty detailed steps in the README I have added to the app:

Akilan1999 commented 2 years ago

Awesome ! Thanks a lot for that.