(Bug?) `ValueError: size mismatched, can't reshape self.shape=(1, 25, 128256, 4096) -> new_shape=(1, 25, 32, 128)`

fullofcaffeine commented 1 month ago

I have a cluster with 3 machines:

Ubuntu Linux 22.04 with 32GB RAM + Quadro RTX 4000
Ubuntu Linux 22.04 with 64GB RAM + Quadro RTX 5000
M2 MacOS 14.7 with 32GB of RAM

I still couldn't get the Linux nodes to show their TFLOPS, it's still showing as 0 (zero), but it doesn't seem to be related to the issue (though maybe I'm wrong?). AFAIK, nvidia-cuda is installed and working (via nvidia-cuda-toolkit apt package, I'm using the v560 (open) driver from nvidia).

I'm trying to run Llama 3.1 8B. As soon as I open tinychat in any of the nodes and start typing with Llama 8B selected, after a while RTX 5000 node fails with:

(...)
ram used:  9.16 GB, layers.29.attention_norm.weight                   :  92%|█████████████████████████████████████████████████████████████████████████████████████▋       | 269/292 [00:05<00:00, 49.12it/s]
ram used:  9.16 GB, layers.29.ffn_norm.weight                         :  92%|█████████████████████████████████████████████████████████████████████████████████████▉       | 270/292 [00:05<00:00, 49.26it/s]
ram used:  9.16 GB, layers.30.attention.wq.weight                     :  93%|██████████████████████████████████████████████████████████████████████████████████████▎      | 271/292 [00:05<00:00, 49.40it/s]
ram used:  9.16 GB, layers.30.attention.wk.weight                     :  93%|██████████████████████████████████████████████████████████████████████████████████████▋      | 272/292 [00:05<00:00, 49.54it/s]
ram used:  9.16 GB, layers.30.attention.wv.weight                     :  93%|██████████████████████████████████████████████████████████████████████████████████████▉      | 273/292 [00:05<00:00, 49.69it/s]
ram used:  9.16 GB, layers.30.attention.wo.weight                     :  94%|███████████████████████████████████████████████████████████████████████████████████████▎     | 274/292 [00:05<00:00, 49.83it/s]
ram used:  9.16 GB, layers.30.feed_forward.w1.weight                  :  94%|███████████████████████████████████████████████████████████████████████████████████████▌     | 275/292 [00:05<00:00, 49.93it/s]
ram used:  9.16 GB, layers.30.feed_forward.w2.weight                  :  95%|███████████████████████████████████████████████████████████████████████████████████████▉     | 276/292 [00:05<00:00, 50.06it/s]
ram used:  9.16 GB, layers.30.feed_forward.w3.weight                  :  95%|████████████████████████████████████████████████████████████████████████████████████████▏    | 277/292 [00:05<00:00, 50.20it/s]
ram used:  9.16 GB, layers.30.attention_norm.weight                   :  95%|████████████████████████████████████████████████████████████████████████████████████████▌    | 278/292 [00:05<00:00, 50.34it/s]
ram used:  9.16 GB, layers.30.ffn_norm.weight                         :  96%|████████████████████████████████████████████████████████████████████████████████████████▊    | 279/292 [00:05<00:00, 50.48it/s]
ram used:  9.16 GB, layers.31.attention.wq.weight                     :  96%|█████████████████████████████████████████████████████████████████████████████████████████▏   | 280/292 [00:05<00:00, 50.62it/s]
ram used:  9.16 GB, layers.31.attention.wk.weight                     :  96%|█████████████████████████████████████████████████████████████████████████████████████████▍   | 281/292 [00:05<00:00, 50.76it/s]
ram used:  9.16 GB, layers.31.attention.wv.weight                     :  97%|█████████████████████████████████████████████████████████████████████████████████████████▊   | 282/292 [00:05<00:00, 50.85it/s]
ram used:  9.16 GB, layers.31.attention.wo.weight                     :  97%|██████████████████████████████████████████████████████████████████████████████████████████▏  | 283/292 [00:05<00:00, 50.98it/s]
ram used:  9.16 GB, layers.31.feed_forward.w1.weight                  :  97%|██████████████████████████████████████████████████████████████████████████████████████████▍  | 284/292 [00:05<00:00, 51.11it/s]
ram used:  9.16 GB, layers.31.feed_forward.w2.weight                  :  98%|██████████████████████████████████████████████████████████████████████████████████████████▊  | 285/292 [00:05<00:00, 51.24it/s]
ram used:  9.16 GB, layers.31.feed_forward.w3.weight                  :  98%|███████████████████████████████████████████████████████████████████████████████████████████  | 286/292 [00:05<00:00, 51.36it/s]
ram used:  9.16 GB, layers.31.attention_norm.weight                   :  98%|███████████████████████████████████████████████████████████████████████████████████████████▍ | 287/292 [00:05<00:00, 51.48it/s]
ram used:  9.16 GB, layers.31.ffn_norm.weight                         :  99%|███████████████████████████████████████████████████████████████████████████████████████████▋ | 288/292 [00:05<00:00, 51.59it/s]
ram used:  9.16 GB, norm.weight                                       :  99%|████████████████████████████████████████████████████████████████████████████████████████████ | 289/292 [00:05<00:00, 51.70it/s]
update_peers: added=[] removed=[] updated=[] unchanged=[<exo.networking.grpc.grpc_peer_handle.GRPCPeerHandle object at 0x7606b3947920>, <exo.networking.grpc.grpc_peer_handle.GRPCPeerHandle object at 
0x7606b1b965a0>] to_disconnect=[] to_connect=[]
did_peers_change=False
Received request: GET /v1/download/progress
ram used:  9.16 GB, tok_embeddings.weight                             :  99%|████████████████████████████████████████████████████████████████████████████████████████████▎| 290/292 [00:06<00:00, 47.50it/s]
ram used: 10.21 GB, output.weight                                     : 100%|████████████████████████████████████████████████████████████████████████████████████████████▋| 291/292 [00:06<00:00, 47.62it/s]
ram used: 10.21 GB, freqs_cis                                         : 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 292/292 [00:06<00:00, 47.75it/s]
ram used: 10.21 GB, freqs_cis                                         : 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 292/292 [00:06<00:00, 47.72it/s]
loaded weights in 6123.58 ms, 10.21 GB loaded at 1.67 GB/s
Checking if local path exists to load tokenizer from local local_path=None
Trying AutoProcessor for /home/fullofcaffeine/.cache/huggingface/hub/models--mlabonne--Meta-Llama-3.1-8B-Instruct-abliterated/snapshots/368c8ed94ce4c986e7b9ca5c159651ef753908ce
Error processing tensor for shard Shard(model_id='mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated', start_layer=0, end_layer=20, n_layers=32): size mismatched, can't reshape self.shape=(1, 25, 128256, 
4096) -> new_shape=(1, 25, 32, 128)
Traceback (most recent call last):
  File "/home/fullofcaffeine/workspace/code/exo/exo/orchestration/standard_node.py", line 211, in _process_tensor
    result, inference_state, is_finished = await self.inference_engine.infer_tensor(request_id, shard, tensor, inference_state=inference_state)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/exo/inference/tinygrad/inference.py", line 80, in infer_tensor
    h = await asyncio.get_event_loop().run_in_executor(self.executor, lambda: self.model(Tensor(input_data), start_pos, TEMPERATURE).realize())
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/exo/inference/tinygrad/inference.py", line 80, in <lambda>
    h = await asyncio.get_event_loop().run_in_executor(self.executor, lambda: self.model(Tensor(input_data), start_pos, TEMPERATURE).realize())
                                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/exo/inference/tinygrad/models/llama.py", line 214, in __call__
    return self.forward(tokens, start_pos, temperature, top_k, top_p, alpha_f, alpha_p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/exo/inference/tinygrad/models/llama.py", line 202, in forward
    h = layer(h, start_pos, freqs_cis, mask)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/exo/inference/tinygrad/models/llama.py", line 107, in __call__
    h = x + self.attention(self.attention_norm(x), start_pos, freqs_cis, mask)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/exo/inference/tinygrad/models/llama.py", line 61, in __call__
    xq = xq.reshape(xq.shape[0], xq.shape[1], self.n_heads, self.head_dim)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/tinygrad/tensor.py", line 3500, in _wrapper
    ret = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/tinygrad/tensor.py", line 870, in reshape
    return F.Reshape.apply(self, shape=new_shape) if new_shape != self.shape else self
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/tinygrad/tensor.py", line 37, in apply
    ret.lazydata, ret.requires_grad, ret.grad = ctx.forward(*[t.lazydata for t in x], **kwargs), ctx.requires_grad, None
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/tinygrad/function.py", line 182, in forward
    return x.reshape(shape)
           ^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/tinygrad/lazy.py", line 214, in reshape
    def reshape(self, arg:Tuple[sint, ...]): return self._view(self.st.reshape(arg))
                                                               ^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/tinygrad/shape/shapetracker.py", line 136, in reshape
    if getenv("MERGE_VIEW", 1) and (new_view := self.views[-1].reshape(new_shape)) is not None: return ShapeTracker(self.views[0:-1] + (new_view,))
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/tinygrad/shape/view.py", line 278, in reshape
    raise ValueError(f"size mismatched, can't reshape {self.shape=} -> {new_shape=}")
ValueError: size mismatched, can't reshape self.shape=(1, 25, 128256, 4096) -> new_shape=(1, 25, 32, 128)
SendTensor tensor shard=Shard(model_id='mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated', start_layer=18, end_layer=26, n_layers=32) tensor=array([[[ 0.0894 ,  0.1609 , -0.4956 , ...,  0.407  ,  0.5415 ,
         -0.329  ],
        [ 0.0894 ,  0.1609 , -0.4956 , ...,  0.407  ,  0.5415 ,
         -0.329  ],
        [-0.0381 , -0.07245,  0.1406 , ...,  0.2634 , -0.06097,
          0.11346],
        ...,
        [ 0.1456 ,  0.05814,  0.02051, ..., -0.1453 , -0.1982 ,
         -0.02417],
        [ 0.1399 ,  0.0958 , -0.0939 , ..., -0.0735 , -0.3967 ,
         -0.0407 ],
        [ 0.1007 ,  0.2139 , -0.1554 , ..., -0.1282 , -0.4033 ,
         -0.10376]]], dtype=float16) request_id='d1ac67ac-ff04-4be1-a5e8-5f008a8b689e' result: None
Broadcasting opaque status: request_id='d1ac67ac-ff04-4be1-a5e8-5f008a8b689e' status='{"type": "node_status", "node_id": "88a0ac28-6590-4edb-88ca-5095cb74caba", "status": "end_process_tensor", 
"base_shard": {"model_id": "mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated", "start_layer": 18, "end_layer": 26, "n_layers": 32}, "shard": {"model_id": "mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated", 
"start_layer": 0, "end_layer": 20, "n_layers": 32}, "request_id": "d1ac67ac-ff04-4be1-a5e8-5f008a8b689e", "elapsed_time_ns": 8303717050, "result_size": 0}'
update_peers: added=[] removed=[] updated=[] unchanged=[<exo.networking.grpc.grpc_peer_handle.GRPCPeerHandle object at 0x7606b3947920>, <exo.networking.grpc.grpc_peer_handle.GRPCPeerHandle object at

Then the RTX 4000 node fails with:

Error connecting peer 6dab970d-5422-4512-90ef-9addfc3559b9@10.0.4.39:52959: 
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/exo/networking/grpc/grpc_peer_handle.py", line 37, in connect
    await self.channel.channel_ready()
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/grpc/aio/_channel.py", line 478, in channel_ready
    await self.wait_for_state_change(state)
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/grpc/aio/_channel.py", line 471, in wait_for_state_change
    assert await self._channel.watch_connectivity_state(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/channel.pyx.pxi", line 97, in watch_connectivity_state
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/fullofcaffeine/workspace/code/exo/exo/orchestration/standard_node.py", line 312, in connect_with_timeout
    await asyncio.wait_for(peer.connect(), timeout)
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 519, in wait_for
    async with timeouts.timeout(timeout):
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/timeouts.py", line 115, in __aexit__
    raise TimeoutError from exc_val
TimeoutError
Error connecting peer 88a0ac28-6590-4edb-88ca-5095cb74caba@10.0.4.81:50171: 
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/fullofcaffeine/workspace/code/exo/exo/networking/grpc/grpc_peer_handle.py", line 37, in connect
    await self.channel.channel_ready()
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/grpc/aio/_channel.py", line 478, in channel_ready
    await self.wait_for_state_change(state)
  File "/home/fullofcaffeine/workspace/code/exo/.venv/lib/python3.12/site-packages/grpc/aio/_channel.py", line 471, in wait_for_state_change
    assert await self._channel.watch_connectivity_state(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/channel.pyx.pxi", line 97, in watch_connectivity_state
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/fullofcaffeine/workspace/code/exo/exo/orchestration/standard_node.py", line 312, in connect_with_timeout
    await asyncio.wait_for(peer.connect(), timeout)
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 519, in wait_for
    async with timeouts.timeout(timeout):
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/timeouts.py", line 115, in __aexit__
    raise TimeoutError from exc_val
TimeoutError
Removing download task for Shard(model_id='mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated', start_layer=27, end_layer=31, n_layers=32): True
Removing download task for Shard(model_id='mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated', start_layer=21, end_layer=31, n_layers=32): True
Error in cleanup peers: dictionary changed size during iteration
Traceback (most recent call last):
  File "/home/fullofcaffeine/workspace/code/exo/exo/networking/udp/udp_discovery.py", line 174, in task_cleanup_peers
    for peer_id, (peer_handle, connected_at, last_seen, prio) in self.known_peers.items():
                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: dictionary changed size during iteration

And finally, here's the log for the M2 node:

(...)
ram used: 16.75 GB, tok_embeddings.weight                             :  99%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎| 290/292 [00:07<00:00, 36.79it/s]
ram used: 17.81 GB, output.weight                                     : 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████▋| 291/292 [00:07<00:00, 36.82it/s]
ram used: 17.81 GB, freqs_cis                                         : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 292/292 [00:07<00:00, 36.88it/s]
ram used: 17.81 GB, freqs_cis                                         : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 292/292 [00:07<00:00, 36.81it/s]
loaded weights in 7941.89 ms, 8.90 GB loaded at 1.12 GB/s
Error in cleanup peers: dictionary changed size during iteration
Traceback (most recent call last):
  File "/Users/fullofcaffeine/workspace/exo/exo/networking/udp/udp_discovery.py", line 174, in task_cleanup_peers
    for peer_id, (peer_handle, connected_at, last_seen, prio) in self.known_peers.items():
                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: dictionary changed size during iteration

Timeout sending opaque status to 88a0ac28-6590-4edb-88ca-5095cb74caba
Error in cleanup peers: dictionary changed size during iteration
Traceback (most recent call last):
  File "/Users/fullofcaffeine/workspace/exo/exo/networking/udp/udp_discovery.py", line 174, in task_cleanup_peers
    for peer_id, (peer_handle, connected_at, last_seen, prio) in self.known_peers.items():
                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: dictionary changed size during iteration

I often only get the first few chars of the LLM answer and then it stops.

Any ideas on why these are failing? All nodes are using the 2b9dec2. I'm using python3.12 on all systems and I'm activating the venv before starting it. On the Linux systems I start it with CUDA=1 exo and on the Mac system with exo --inference-engine tinygrad.

Thanks in advance!

AlexCheema commented 1 month ago

I don't see anything obviously wrong with your setup. It looks all correct. The logs suggest perhaps some networking issues. The fact that it generates some tokens then stops also confirms a network issue. What network are you running on? What's the bandwidth / latency / jitter between devices like? Can you try pinging or running a small network test with iperf3

fullofcaffeine commented 1 month ago

Hi Alex! Thanks for the reply.

I don't see anything obviously wrong with your setup. It looks all correct.

Cool, that's good to read. As a side question, I assume the "0TFLOPs" for the two Linux nodes there is not too important then?

What network are you running on?

It's a regular LAN, and the boxes are all connecting via wifi (5ghz). My router is a Synology rt2600ac, all nodes are connected to the same wifi network.

Let me know if you need more info about it or the nodes.

Can you try pinging or running a small network test with iperf3

I didn't know about this tool. I'll try it out and report back the results.

fullofcaffeine commented 1 month ago

I ran iperf3 as a server in my Mac M2 and then spun up a client from the Quadro RTX5k machine:

Mac output:

iperf3 -s                                                                                                                                                   [ruby-2.6.10p210]
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------

iperf3: error - unable to receive parameters from client:
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
Accepted connection from 10.0.4.81, port 38314
[  5] local 10.0.4.39 port 5201 connected to 10.0.4.81 port 38318
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec  48.1 MBytes   402 Mbits/sec
[  5]   1.01-2.01   sec  40.9 MBytes   343 Mbits/sec
[  5]   2.01-3.00   sec  39.9 MBytes   335 Mbits/sec
[  5]   3.00-4.00   sec  42.1 MBytes   353 Mbits/sec
[  5]   4.00-5.00   sec  55.4 MBytes   465 Mbits/sec
[  5]   5.00-6.00   sec  52.8 MBytes   444 Mbits/sec
[  5]   6.00-7.00   sec  53.8 MBytes   451 Mbits/sec
[  5]   7.00-8.00   sec  53.5 MBytes   447 Mbits/sec
[  5]   8.00-9.00   sec  51.2 MBytes   431 Mbits/sec
[  5]   9.00-10.00  sec  53.8 MBytes   450 Mbits/sec
[  5]  10.00-10.02  sec  1.38 MBytes   512 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.02  sec   493 MBytes   412 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201 (test #3)
-----------------------------------------------------------

Linux output:

Connecting to host 10.0.4.39, port 5201
[  5] local 10.0.4.81 port 38318 connected to 10.0.4.39 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  49.9 MBytes   418 Mbits/sec  593   2.34 MBytes       
[  5]   1.00-2.00   sec  41.2 MBytes   346 Mbits/sec  352   1.74 MBytes       
[  5]   2.00-3.00   sec  40.0 MBytes   336 Mbits/sec    0   1.84 MBytes       
[  5]   3.00-4.00   sec  42.5 MBytes   357 Mbits/sec    0   1.91 MBytes       
[  5]   4.00-5.00   sec  56.2 MBytes   472 Mbits/sec   37   1.40 MBytes       
[  5]   5.00-6.00   sec  52.5 MBytes   440 Mbits/sec    0   1.48 MBytes       
[  5]   6.00-7.00   sec  53.8 MBytes   451 Mbits/sec    0   1.55 MBytes       
[  5]   7.00-8.00   sec  53.8 MBytes   451 Mbits/sec    0   1.59 MBytes       
[  5]   8.00-9.00   sec  51.2 MBytes   430 Mbits/sec    0   1.62 MBytes       
[  5]   9.00-10.00  sec  53.8 MBytes   451 Mbits/sec    0   1.64 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   495 MBytes   415 Mbits/sec  982             sender
[  5]   0.00-10.02  sec   493 MBytes   412 Mbits/sec                  receiver

iperf Done.

Do you see anything off? Let me know if you need more data.

Thanks!

exo-explore / exo

(Bug?) `ValueError: size mismatched, can't reshape self.shape=(1, 25, 128256, 4096) -> new_shape=(1, 25, 32, 128)` #288