ProjectNUWA / LayoutNUWA

MIT License
134 stars 16 forks source link

error to build training data #2

Closed hyer closed 1 year ago

hyer commented 1 year ago

I got an error when running the build training data script:

python convertHTML/build_code.py \
    --model_path_or_name /path/to/llamamodel \
    --dataset_name rico25 \
    --dataset_path data/rico25-max25 \
    --save_path data/rico25-max25/html_format \
    --bbox_quantization code \
    --consistency_num 10 \
    --add_task_instruction;

console outputs:

begin to save train file >>> data/rico25-max25/html_format
  0%|                                                                                                                                                                                                                 | 0/35851 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/data/code/LayoutNUWA/convertHTML/build_code.py", line 647, in <module>
    for i, batch_inputs in enumerate(train_dataloader):
  File "/mnt/data/code/LayoutNUWA/convertHTML/build_code.py", line 559, in __iter__
    yield self.custom_function(data, i, self_consistency=self_consistency)  
  File "/mnt/data/code/LayoutNUWA/convertHTML/build_code.py", line 402, in custom_function
    bbox_cond_seqs = [
  File "/mnt/data/code/LayoutNUWA/convertHTML/build_code.py", line 403, in <listcomp>
    self.cond_bbox_prefix.format(categories=cate, bbox_html=bbox_html) 
AttributeError: 'CustomDataLoader' object has no attribute 'cond_bbox_prefix'

I have check the source code and there is not attribute 'cond_bbox_prefix' in CustomDataLoader.

k1414st commented 1 year ago

I found that cond_box_prefix and cond_cate_prefix are only used to generate the test_llama_numerical.jsonl file, but all executable scripts (including evaluation.py) do not use this file.
(Instead, evaluation.py uses the pre-generated data/generated_results/rico/golden.jsonl file.)

So, it seems to be solved by erasing all the cond_bbox_prefix and cond_cate_prefix variables and their dependent codes, or simply inserting dummy strings like:

        self.cond_bbox_prefix = ""                                                                                                                                                                                                                                                                                          
        self.cond_cate_prefix = ""   
hyer commented 1 year ago

I found that cond_box_prefix and cond_cate_prefix are only used to generate the test_llama_numerical.jsonl file, but all executable scripts (including evaluation.py) do not use this file. (Instead, evaluation.py uses the pre-generated data/generated_results/rico/golden.jsonl file.)

So, it seems to be solved by erasing all the cond_bbox_prefix and cond_cate_prefix variables and their dependent codes, or simply inserting dummy strings like:

        self.cond_bbox_prefix = ""                                                                                                                                                                                                                                                                                          
        self.cond_cate_prefix = ""   

Yes, I delete the irrelevant code.

ZetangForward commented 1 year ago

Thanks for your attention to our work and sorry for my late reply. We will clean the code soon, and you can refer to the original paper to delete some irrelevant code (this part of code is mainly for my ablation studies).