ezhang7423 / language-control-diffusion

MIT License
27 stars 3 forks source link

Hi, how to download the right hulc-data/hulc-trajectories/12_all_trajectories.pt ? #5

Open zhufq00 opened 9 months ago

zhufq00 commented 9 months ago
1706362612101

I did everything well except when I run lcd train_hulc

I can not find the right 12_all_trajectories.pt in https://github.com/ezhang7423/hulc-data.git too. The dir "hulc-trajectories" is empty.

1706362730150

Awesome job and thank you very much for your response!

zhufq00 commented 9 months ago

I found this dataset in huggingface.

Here are some tips for run this project: Tips 1: Dataset download in huggingface change this command in makefile git submodule update --init --recursive ./submodules/hulc-data;\ to
git submodule update --init --recursive ./submodules/hulc-data cd submodules/hulc-data/hulc-trajectories && git lfs pull cd submodules/hulc-data/lcd-seeds && git lfs pull cd submodules/hulc-data/hulc-baselines-30 && git lfs pull

Tips 2: export PATH="$HOME/.local/bin:$PATH" when you can not find poetry, use this to verify "poetry --version"

zhufq00 commented 9 months ago

One of my feedback points is that this paper is quite difficult to understand, with many unnecessary concepts and formulas introduced. At the same time, the details regarding the input and output of the model are unclear, although it might be because I am not very familiar with these terms. I was able to slightly understand it after viewing https://diffusion-planning.github.io, and I also find http://hulc.cs.uni-freiburg.de to be quite complex.

I would like to ask about some details to ensure my understanding of the paper, primarily about the inputs and outputs of the High-level and Low-level policies.

My assumption is that during the training process, the input to the High-level policy is:

  1. Initial state, which may be $g_t$, and here $g_t$ should be $s_0$.
  2. Some subsequent states $sc$, $s{2c}$, $s_T$.
  3. Some noise t that is added to the latent plan $s_0$, $sc$, $s{2c}$, $s_T$.
  4. Text embedding, possibly interacting with the latent plan through cross-attention as in Imagen.

The output of the High-level policy is:

  1. Latent plan (denoised).

During the inference process, the input to the High-level policy is:

  1. Initial state.
  2. Random noise as $sc$, $s{2c}$, $s_T$.

The output of the High-level policy is:

  1. Latent plan (denoised).

I assume that the High-level policy and Low-level policy are trained separately. We first train the Low-level policy without diffusion process to get the frozen low-level policy encoder.

The inputs and outputs of the Low-level policy in HULC during training and inference are as shown in the image. Low-level policy input and output diagram

But it seems that HULC generate a sequence of action instead of one action in LCD.

How LLP is trained and how LLP inference? I can not find Appendix E, https://imgur.com/MwzAO6s, it seems that this section is very important to understand LLP.

How is T determined? Is it a fixed parameter or something else?

Additionally, there is a typo in Section 4.6 "ROBUSTNESS TO HYPERPARAMETERS" – "??".

Thank you very much for your work. Although it seems a bit difficult to understand, it's very intriguing!

I am looking forward to your evaluation of whether my understanding is accurate.