CanPeng123 / Faster-ILOD

45 stars 7 forks source link

environment issues #5

Closed wsjxdy closed 2 years ago

wsjxdy commented 2 years ago

The environment of this project is too difficult to configure. After reconfiguration many times, some inexplicable errors will still be reported. In install.mad, the code to install torch-nightly will report an error. Installing torch will report “module 'torch._six' has no attribute'PY3' ”error. I hope you package the environment on the network disk, and then put a link. Or give a feasible installation tutorial. Thank you and look forward to your reply.

wsjxdy commented 2 years ago

@CanPeng123

wsjxdy commented 2 years ago

I tried debugging for another week, but I still couldn't configure the environment well. I hope you can give a feasible installation tutorial or upload your environment to the network disk. Looking forward to your reply as soon as possible! @CanPeng123

CanPeng123 commented 2 years ago

Hi, The environments I am using are: CUDA: 10.1.243 gcc: 7.5.0 python: 3.6.10 yacs: 0.1.7

I built the code based on maskrcnn_benchmark. You can refer to their installation tutorial for the environment setup. https://github.com/facebookresearch/maskrcnn-benchmark/blob/main/README.md

Hope this can help you.

wsjxdy commented 2 years ago

Thanks for your quick reply! It works! 1. I run “train_first_step.py ”(“e2e_faster_rcnn_R_50_C4_1x.yaml”) as a regular training to train the first 15 categories. The new_classes, excluded_classes is shown in the follows: NAME_NEW_CLASSES: ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person"] NAME_EXCLUDED_CLASSES: ["pottedplant", "sheep", "sofa", "train", "tvmonitor"] 2. If I want to do 15+1+1+1+1+1+1 incremental training, how should I set the yaml file in the second training step "train_incremental.py" ? (As for category 16, is it a modification of target_model.yaml? Then the source_model.yaml What's the role?) What about training category 17? 3. If I want to do 15+5 incremental training, how should I set the yaml file(target_model or source_model) in the second training step (category 15-20)? (I mainly do not understand the difference between new-class and excluded-class and how to set yaml file when do 15+1+1+1+1+1 incremental training or 15+5 incremental training.) Thanks for your help! Looking forward to your reply as soon as possible! @CanPeng123

wsjxdy commented 2 years ago

1. Can I understand it as "e2e_faster_rcnn_R_50_C4_1x.yaml" for initial training and "e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml" and "e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml" for incremental training? 2. During incremental training, the initial training weights are loaded into Source_model and Target_model respectively, where the weights of Source_model are fixed and not optimized, and Target_model participates in the optimization. Do I understand correctly? ("e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml" is used as a relay for the trained model of "e2e_faster_rcnn_R_50_C4_1x.yaml") 3. Regarding the understanding of new_classes, and excluded_classes, during 15+1+1+1+1+1 incremental training, the first increment (category 16), in "e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml", old_classes is the 15 trained classes. new_classes is "category 16", and excluded_classes is the remaining four untrained categories. 4. In the second increment, old_classes in "e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml" is the 15 trained classes and the category 16 of the previous increment process, new_classes is "category 17", and excluded_classes is the remaining three untrained categories. I don’t know if what I understand is correct? What about old_classes, new_classes and excluded_classes in "e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml"? Looking forward to your reply as soon as possible! @CanPeng123

wsjxdy commented 2 years ago

In addition, if I want to train 15+1+1+1+1+1, does the “num_classes” (VOC) in the three yaml files are all 21(all_category+"background")?

gzgz-code commented 2 years ago

I have the same question as you. I want to know how to select and change the corresponding configuration files when selecting different incremental training steps, such as one-step training and multi-step training, and how to use commands to train them.Can you give an answer? I will be very grateful to you @CanPeng123

CanPeng123 commented 2 years ago

During the multi-step training, all the categories trained before are regarded as old classes. Only the current used categories are regarded as new classes. The left classes will be regarded as excluded classes since they are not included in this training step.

For example, if you want to train the second step of the alphabetical order of 15+1+1+1+1+1 setting on the VOC dataset (15 + 1 + 1):

The old classes are "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant". Totally 16 old categories.

The new class is "sheep". Only one new class for this step.

The excluded classes are "sofa", "train", "tvmonitor". The left three classes since these classes data have not been included in this step.

Please refer to Faster-ILOD/maskrcnn_benchmark/data/datasets/voc.py and Faster-ILOD/maskrcnn_benchmark/data/datasets/coco.py for the detailed settings.

UnityBoy commented 2 years ago

The readme said to use pytorch1.0 but it report a error " module 'torch.cuda' has no attribute 'amp' ". i searched pytorch1.0 has no amp ,amp occured when it raised to ptorch1.7? how can i run this success? any help will grateful to you @CanPeng123

xhxhxh11 commented 2 years ago

Hello, I would like to ask whether your experiment can be reproduced normally. My experiment result is abnormal~

Sshxh commented 2 years ago

I tried debugging for another week, but I still couldn't configure the environment well. I hope you can give a feasible installation tutorial or upload your environment to the network disk. Looking forward to your reply as soon as possible! @CanPeng123

I have the same environmental problem as you. May I ask how you successfully modified it? Thank you!!

Sshxh commented 2 years ago

The environment of this project is too difficult to configure. After reconfiguration many times, some inexplicable errors will still be reported. In install.mad, the code to install torch-nightly will report an error. Installing torch will report “module 'torch._six' has no attribute'PY3' ”error. I hope you package the environment on the network disk, and then put a link. Or give a feasible installation tutorial. Thank you and look forward to your reply.

My friend, can you pack the environment that you have successfully run before? I have tried for a long time but failed. Thank you for your trouble

wsjxdy commented 2 years ago

@Sshxh The environments I am using are: torch: 1.3.1 (Regular version, not nightly version) torchvision: 0.2.2 CUDA: 10.1.243 gcc: 7.5.0 python: 3.6.10 yacs: 0.1.7 It can work well on my computer(Titanxp). Hope my reply can help you!

YuQianzi commented 1 year ago

@wsjxdy hi, have you understood how to train the model? could you please give me a complete training step like:

Base 19

python tools/train_net.py --num-gpus 4 --config-file ./configs/PascalVOC-Detection/iOD/base_19.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005

19 + 1

sleep 10 python tools/train_net.py --num-gpus 4 --config-file ./configs/PascalVOC-Detection/iOD/19_p_1.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005

19 + 1 _ ft

sleep 10 python tools/train_net.py --num-gpus 4 --config-file ./configs/PascalVOC-Detection/iOD/ft_19_p_1.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005

I have no idea what to do when I finish the training step: python tools/train_first_step.py --config-file ./configs/e2e_faster_rcnn_R_50_C4_1x.yaml

YuQianzi commented 1 year ago

@wsjxdy should i just run the command: python train_incremental.py ?

CanPeng123 commented 1 year ago

Hi,

If you want to run the first step (base step), you need to use train_first_step.py and config file e2e_faster_rcnn_R_50_C4_1x.yaml. The config file e2e_faster_rcnn_R_50_C4_1x.yaml shows how to run the base step with 15 classes.

If you want to run the following incremental steps, you need to use train_incremental.py and both the config files e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml and e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml. e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml for the source model and e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml for the target model. These config files show how to run the incremental step for the 10 + 10 setting.

Please remember to always change NUM_CLASSES, NAME_OLD_CLASSES, NAME_NEW_CLASSES, and NAME_EXCLUDED_CLASSES for different incremental settings on the config files.

Hope this can clarify your confusion.