Open neptunes5thmoon opened 3 months ago
BONUS
@neptunes5thmoon let us know when this is ready for review
it's ready for review, I made all the must have changes ;) I haven't gotten the tensorboard to work in vscode, so could use some help with that
hey @neptunes5thmoon, regarding tensorboard, I've checked it and it looks like is a port forwarding problem, as it is not automatically done when launching the %tensorboard --logdir...
If you manually forward the port from vscode (Ports
tab next to the Terminal
) and add it to the tensorboard call (so, %tensorboard --logdir abcde --port 6060
) and run the cell (or run the cell, checked the assigned port, forward it, and then run the cell again), then it actually displays. That's probably not expected behaviour 🙂
Even then, at least to me, it's not running too smoothly (e.g. not auto-updating the training loss curve), and sometimes you need to run the cell twice even with the port already forwarded for the inline to display.
An alternative is to use the tensorboard extension and have the UI in a different VSCode tab, which you can then put side-by-side with the notebook and is quite convenient as you don't need to scroll up and down and seems to work smoother... but at the same time I think setting it up is more cumbersome compared to the inline version
Here is my review:
awscli
in requirements.txt
. I was running into issuesFor the tensorboard, I had this in the image_translation
notebook. Not sure if the instructions.
"""
The next cell starts tensorboard.
<div class="alert alert-warning">
If you launched jupyter lab from ssh terminal, add <code>--host <your-server-name></code> to the tensorboard command below. <code><your-server-name></code> is the address of your compute node that ends in amazonaws.com.
</div>
<div class="alert alert-warning">
If you are using VSCode and a remote server, you will need to forward the port to view the tensorboard. <br>
Take note of the port number was assigned in the previous cell.(i.e <code> http://localhost:{port_number_assigned}</code>) <br>
Locate the your VSCode terminal and select the <code>Ports</code> tab <br>
<ul>
<li>Add a new port with the <code>port_number_assigned</code>
</ul>
Click on the link to view the tensorboard and it should open in your browser.
</div>
"""
# %% Imports and paths tags=[]
# Function to find an available port
def find_free_port():
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))
return s.getsockname()[1]
# Launch TensorBoard on the browser
def launch_tensorboard(log_dir):
import subprocess
port = find_free_port()
tensorboard_cmd = f"tensorboard --logdir={log_dir} --port={port}"
process = subprocess.Popen(tensorboard_cmd, shell=True)
print(
f"TensorBoard started at http://localhost:{port}. \n"
"If you are using VSCode remote session, forward the port using the PORTS tab next to TERMINAL."
)
return process
# Launch tensorboard and click on the link to view the logs.
tensorboard_process = launch_tensorboard(log_dir)
I had no issues running through the whole notebook. For the last training and displaying with tensorboard, it would be great if we could display all three in one slider that way you can see the input, output, and target at the same time.
I would concat the tb_logger
for the input,target and prediction with:
# Concatenate along the width (side by side)
combined_image = torch.cat([x.to("cpu"), y.to("cpu"), prediction.to("cpu").detach()], dim=3)
# Log the combined image to TensorBoard
tb_logger.add_images(
tag="input_target_prediction", img_tensor=combined_image, global_step=step
)
Right after task 3.2, I noticed that apply_and_show_random_image(conv, dataset)
usually returns empty frames for the first channel input. Perhaps we can add a comment to ask the person to re-run the cell a couple of times.
I really liked the unet_tests.py
!
These are some minor fixes, but overall the notebook looks great! Great job
Thanks y'all!
I will work on incorporating your suggestions. My problem with tensorboard was actually that it couldn't find tensorboard on the PATH
from vscode so the command wasn't doing anything. I'll report back whether either of your solutions work.
@edyoshikun re: installation - were you running the exercise on the TA machines? When I checked there awscli
was pre-installed.
@neptunes5thmoon I was able to run the tensorboard with vscode with your current implementation. It created the logs in the curr_dir ./
. The only other thing I had was the tensorboard extension. Not sure if that makes a difference.
I was testing locally on an HPC node, but if the TA machines have awscli then no need to add.
BONUS