Write TensorboardWriter

prigoyal commented 3 years ago

In this task, our goal is to implement a TensorboardWriter that leverages the VISSL storage object that we build in Phase2. The task summarizes the steps we can take for that:

Step1: Setup Tensorboard

[x] Read about Tensorboard https://www.tensorflow.org/tensorboard
[x] Read the documentation https://vissl.readthedocs.io/en/latest/visualization.html for how tensorboard works with VISSL
[x] Run a tensorboard visualization python tools/run_distributed_engines.py config=test/cpu_test/test_cpu_resnet_simclr config.DATA.TRAIN.DATA_SOURCES=[synthetic] config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=true config.CHECKPOINT.DIR=./checkpoints and verify that you can see visualization on tensorboard. Follow the tensorboard documentation for how the visualization tool can be opened in your browser on a localhost.

Step2: Implement the tensorboard related methods in the VisslEventStorage

class VisslEventStorage:
    """
    The user-facing class that stores the running metrics
    and all the logging data in one place. The Storage can
    be updated anywhere in the training step. The storage
    is used by several writers/hooks to write the data to
    several backends (json, tensorboard, etc)
    """
    def __init__(self, start_iter=0):
        """
        Args:
            start_iter (int): the iteration number to start with
        """
        ....
        self._vis_data = []     # later for tensorboard
        self._histograms = []   # later for tensorboard

    ....

    def clear_images(self):
        self._vis_data = []

    def clear_histograms(self):
        self._histograms = []

    def put_histogram(self, hist_name: str, hist_tensor: torch.Tensor, bins: int = 1000):
       # comment from Priya: helpful documentation https://tensorboardx.readthedocs.io/en/latest/tensorboard.html#tensorboardX.SummaryWriter.add_histogram_raw
        """
        Create a histogram from a tensor.
        Args:
            hist_name (str): The name of the histogram to put into tensorboard.
            hist_tensor (torch.Tensor): A Tensor of arbitrary shape to be converted
                into a histogram.
            bins (int): Number of histogram bins.
        """
        ht_min, ht_max = hist_tensor.min().item(), hist_tensor.max().item()

        # Create a histogram with PyTorch
        hist_counts = torch.histc(hist_tensor, bins=bins)
        hist_edges = torch.linspace(start=ht_min, end=ht_max, steps=bins + 1, dtype=torch.float32)

        # Parameter for the add_histogram_raw function of SummaryWriter
        hist_params = dict(
            tag=hist_name,
            min=ht_min,
            max=ht_max,
            num=len(hist_tensor),
            sum=float(hist_tensor.sum()),
            sum_squares=float(torch.sum(hist_tensor ** 2)),
            bucket_limits=hist_edges[1:].tolist(),
            bucket_counts=hist_counts.tolist(),
            global_step=self._iter,
        )
        self._histograms.append(hist_params)

    def put_image(self, img_name, img_tensor):
        """
        Add an `img_tensor` associated with `img_name`, to be shown on
        tensorboard.
        Args:
            img_name (str): The name of the image to put into tensorboard.
            img_tensor (torch.Tensor or numpy.array): An `uint8` or `float`
                Tensor of shape `[channel, height, width]` where `channel` is
                3. The image format should be RGB. The elements in img_tensor
                can either have values in [0, 1] (float32) or [0, 255] (uint8).
                The `img_tensor` will be visualized in tensorboard.
        """
        self._vis_data.append((img_name, img_tensor, self._iter))

Step3: Implement the TensorboardWriter

# in vissl/utils/events.py
class TensorboardWriter(VisslEventWriter):

  def __init__(self, log_dir: str, flush_secs: int, **kwargs):
        """
        Args:
            log_dir (str): the directory to save the output events
            flush_secs (int): flush data to tensorboard every flush_secs
            kwargs: other arguments passed to `torch.utils.tensorboard.SummaryWriter(...)`
        """
        self._flush_secs = flush_secs
        from torch.utils.tensorboard import SummaryWriter

        self._tb_writer = SummaryWriter(log_dir, **kwargs)

  def write(self):
      storage = get_event_storage()
      to_save = defaultdict(dict)

      # storage.put_{image,histogram} is only meant to be used by
      # tensorboard writer. So we access its internal fields directly from here.
        if len(storage._vis_data) >= 1:
            for img_name, img, step_num in storage._vis_data:
                self._tb_writer.add_image(img_name, img, step_num)

            # Storage stores all image data and rely on this writer to clear them.
            # As a result it assumes only one writer will use its image data.
            # An alternative design is to let storage store limited recent
            # data (e.g. only the most recent image) that all writers can access.
            # In that case a writer may not see all image data if its period is long.
            storage.clear_images()

        if len(storage._histograms) >= 1:
            for params in storage._histograms:
                self._tb_writer.add_histogram_raw(**params)
            storage.clear_histograms()

  def close(self):
      if hasattr(self, "_tb_writer"):  # doesn't exist when the code fails at import
            self._writer.close()

Step4: Initialize the writer by appending it to the writers in build_event_storage_writers in the train_task.py. The values of init can be obtained from the config. For example: https://github.com/MLH-Fellowship/vissl/blob/4cffc1c37b236371e4459278c0d7997bfdcd7e8d/vissl/utils/tensorboard.py#L66-L73

Step5: Places where to replace with storage.put_histograms(): https://github.com/MLH-Fellowship/vissl/blob/4cffc1c37b236371e4459278c0d7997bfdcd7e8d/vissl/hooks/tensorboard_hook.py#L87-L90 https://github.com/MLH-Fellowship/vissl/blob/4cffc1c37b236371e4459278c0d7997bfdcd7e8d/vissl/hooks/tensorboard_hook.py#L103-L106 here , add these metrics to the storage using storage.put_scalars https://github.com/MLH-Fellowship/vissl/blob/4cffc1c37b236371e4459278c0d7997bfdcd7e8d/vissl/hooks/tensorboard_hook.py#L123-L127 https://github.com/MLH-Fellowship/vissl/blob/4cffc1c37b236371e4459278c0d7997bfdcd7e8d/vissl/hooks/tensorboard_hook.py#L135-L139 https://github.com/MLH-Fellowship/vissl/blob/4cffc1c37b236371e4459278c0d7997bfdcd7e8d/vissl/hooks/tensorboard_hook.py#L145-L149

Step6: Add a key VISUALIZATION_SAMPLE_PERIOD=-1 to visualize the images here https://github.com/MLH-Fellowship/vissl/blob/930e0c9d9f1f9035f3801b85d61f68377d0487b5/vissl/config/defaults.yaml#L121 and make use of the key at https://github.com/MLH-Fellowship/vissl/blob/master/vissl/trainer/train_steps/standard_train_step.py#L119 to do something like this. The config is in train_task as self.config

vis_period =  task.config.TENSORBOARD_SETUP.VISUALIZATION_SAMPLE_PERIOD
if vis_period > 0 and task.iteration % vis_period == 0:
       storage = get_event_storage()
       storage.put_images()
       name = f"Model input sample: iteration: {task.iteration_num}"
        for idx, vis_img in enumerate(sample["input"]):
            storage.put_image(name + f" ({idx})", vis_img)

grace-omotoso commented 3 years ago

[x] Setup TensorBoard
[x] Implement the Tensorboard related methods in the VisslEventStorage
[x] Implement the TensorboardWriter
[x] Initialize the writer by appending it to the writers in build_event_storage_writers in the train_task.py
[x] Make replacements with new method definitions
[x] Visualize Images

akainth015 commented 3 years ago

@prigoyal when you said

https://github.com/MLH-Fellowship/vissl/blob/master/vissl/hooks/tensorboard_hook.py#L103-L106 -> here , add these metrics to the storage using storage.put_scalars

did you mean the following lines? https://github.com/MLH-Fellowship/vissl/blob/4cffc1c37b236371e4459278c0d7997bfdcd7e8d/vissl/hooks/tensorboard_hook.py#L123-L127

There are also 3 instances of the following lines

https://github.com/MLH-Fellowship/vissl/blob/master/vissl/hooks/tensorboard_hook.py#L135-L139

is that a mistake or did the links just get repeated?

prigoyal commented 3 years ago

@prigoyal when you said

https://github.com/MLH-Fellowship/vissl/blob/master/vissl/hooks/tensorboard_hook.py#L103-L106 -> here , add these metrics to the storage using storage.put_scalars

did you mean the following lines? https://github.com/MLH-Fellowship/vissl/blob/4cffc1c37b236371e4459278c0d7997bfdcd7e8d/vissl/hooks/tensorboard_hook.py#L123-L127

There are also 3 instances of the following lines

https://github.com/MLH-Fellowship/vissl/blob/master/vissl/hooks/tensorboard_hook.py#L135-L139

is that a mistake or did the links just get repeated?

Clarified on the PR :)

MLH-Fellowship / vissl

Write TensorboardWriter #16