dlmbl / unet

0 stars 0 forks source link

2024 TODOs #1

Open neptunes5thmoon opened 1 month ago

neptunes5thmoon commented 1 month ago

BONUS

neptunes5thmoon commented 1 month ago

BONUS

neptunes5thmoon commented 1 month ago
edyoshikun commented 1 month ago

@neptunes5thmoon let us know when this is ready for review

neptunes5thmoon commented 1 month ago

it's ready for review, I made all the must have changes ;) I haven't gotten the tensorboard to work in vscode, so could use some help with that

AlbertDominguez commented 1 month ago

hey @neptunes5thmoon, regarding tensorboard, I've checked it and it looks like is a port forwarding problem, as it is not automatically done when launching the %tensorboard --logdir... If you manually forward the port from vscode (Ports tab next to the Terminal) and add it to the tensorboard call (so, %tensorboard --logdir abcde --port 6060) and run the cell (or run the cell, checked the assigned port, forward it, and then run the cell again), then it actually displays. That's probably not expected behaviour 🙂

Even then, at least to me, it's not running too smoothly (e.g. not auto-updating the training loss curve), and sometimes you need to run the cell twice even with the port already forwarded for the inline to display.

An alternative is to use the tensorboard extension and have the UI in a different VSCode tab, which you can then put side-by-side with the notebook and is quite convenient as you don't need to scroll up and down and seems to work smoother... but at the same time I think setting it up is more cumbersome compared to the inline version

edyoshikun commented 1 month ago

Here is my review:

Installation

Tensorboard

For the tensorboard, I had this in the image_translation notebook. Not sure if the instructions.

"""
The next cell starts tensorboard.

<div class="alert alert-warning">
If you launched jupyter lab from ssh terminal, add <code>--host &lt;your-server-name&gt;</code> to the tensorboard command below. <code>&lt;your-server-name&gt;</code> is the address of your compute node that ends in amazonaws.com.

</div>

<div class="alert alert-warning">
If you are using VSCode and a remote server, you will need to forward the port to view the tensorboard. <br>
Take note of the port number was assigned in the previous cell.(i.e <code> http://localhost:{port_number_assigned}</code>) <br>

Locate the your VSCode terminal and select the <code>Ports</code> tab <br>
<ul>
<li>Add a new port with the <code>port_number_assigned</code>
</ul>
Click on the link to view the tensorboard and it should open in your browser.
</div>
"""
# %% Imports and paths tags=[]

# Function to find an available port
def find_free_port():
    import socket

    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind(("", 0))
        return s.getsockname()[1]

# Launch TensorBoard on the browser
def launch_tensorboard(log_dir):
    import subprocess

    port = find_free_port()
    tensorboard_cmd = f"tensorboard --logdir={log_dir} --port={port}"
    process = subprocess.Popen(tensorboard_cmd, shell=True)
    print(
        f"TensorBoard started at http://localhost:{port}. \n"
        "If you are using VSCode remote session, forward the port using the PORTS tab next to TERMINAL."
    )
    return process
# Launch tensorboard and click on the link to view the logs.
tensorboard_process = launch_tensorboard(log_dir)

I had no issues running through the whole notebook. For the last training and displaying with tensorboard, it would be great if we could display all three in one slider that way you can see the input, output, and target at the same time. I would concat the tb_logger for the input,target and prediction with:

# Concatenate along the width (side by side)
combined_image = torch.cat([x.to("cpu"), y.to("cpu"), prediction.to("cpu").detach()], dim=3)

# Log the combined image to TensorBoard
tb_logger.add_images(
    tag="input_target_prediction", img_tensor=combined_image, global_step=step
)       

Screenshot 2024-08-19 at 9 47 08 AM

After task 3.2

Right after task 3.2, I noticed that apply_and_show_random_image(conv, dataset) usually returns empty frames for the first channel input. Perhaps we can add a comment to ask the person to re-run the cell a couple of times.

I really liked the unet_tests.py!

These are some minor fixes, but overall the notebook looks great! Great job

neptunes5thmoon commented 1 month ago

Thanks y'all!

I will work on incorporating your suggestions. My problem with tensorboard was actually that it couldn't find tensorboard on the PATH from vscode so the command wasn't doing anything. I'll report back whether either of your solutions work.

@edyoshikun re: installation - were you running the exercise on the TA machines? When I checked there awscli was pre-installed.

edyoshikun commented 1 month ago

@neptunes5thmoon I was able to run the tensorboard with vscode with your current implementation. It created the logs in the curr_dir ./. The only other thing I had was the tensorboard extension. Not sure if that makes a difference.

I was testing locally on an HPC node, but if the TA machines have awscli then no need to add.