Kiteretsu77 / APISR

APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)
GNU General Public License v3.0
831 stars 58 forks source link

AssertionError #12

Open 1chuanchuan opened 5 months ago

1chuanchuan commented 5 months ago

Hi brother, I'm running with this error: when I run the train.py file, there are multiple input folders and output folders in the tmp folder, but it reminds me that only one folder can exist.

This is strange
Process Process-1:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process
    self.H264_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
Process Process-4:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 96, in degradate_process
    self.MPEG2_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/mpeg2.py", line 50, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
This is strange
Process Process-6:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process
    self.H264_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
Process Process-5:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 93, in degradate_process
    self.H265_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h265.py", line 50, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
This is strange
Process Process-3:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process
    self.H264_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
This is strange
Process Process-2:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process
    self.H264_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
Kiteretsu77 commented 5 months ago

Have you installed ffmpeg successfully? I guess that this is because the ffmpeg fails and the "temp_store_path" has no compressed images inside the folder.

1chuanchuan commented 5 months ago

I followed the steps in the "Installation" section one by one, and when I ran the "sudo apt install ffmpeg" command line, it came up as follows:

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 90 not upgraded.

I think the installation was successful!

alexaex commented 5 months ago

I also meet this issue, but I recollected the dataset and eventually solved the issue. however, I encountered another issue. the geneator cannot generates the right color of input test image after 100k iterations first stage training. this problem also existed during the adversarial stage. I used finetune GRL and cUNet to train the model on my own dataset. I've checked images in train_hr, train_enhanced_hr, and train_lr which generates from generate_esr_lr, all of them seem correct. I'm wondering if this color issue is caused by the anime videos I chose for the dataset? the anoraml img generated by generator like this:(GRL and cUNet are both models produce anormal color image) anormal_img

Kiteretsu77 commented 5 months ago

I also meet this issue, but I recollected the dataset and eventually solved the issue. however, I encountered another issue. the geneator cannot generates the right color of input test image after 100k iterations first stage training. this problem also existed during the adversarial stage. I used finetune GRL and cUNet to train the model on my own dataset. I've checked images in train_hr, train_enhanced_hr, and train_lr which generates from generate_esr_lr, all of them seem correct. I'm wondering if this color issue is caused by the anime videos I chose for the dataset? the anoraml img generated by generator like this:(GRL and cUNet are both models produce anormal color image) anormal_img

For your issue, I didn't face similar situation like this. Also, I don't recommand to use CuNet in the opt.py. That's a place holder. I used that long while ago but didn't delete it when I released the code. For CuNet, I will finalize my code recently (either delete them or correct it). For the GRL, I would still recommend you to check the dataset. Also, I wonder did you train from the beginning or train from the preitrained weights? Thanks!

alexaex commented 5 months ago

I also meet this issue, but I recollected the dataset and eventually solved the issue. however, I encountered another issue. the geneator cannot generates the right color of input test image after 100k iterations first stage training. this problem also existed during the adversarial stage. I used finetune GRL and cUNet to train the model on my own dataset. I've checked images in train_hr, train_enhanced_hr, and train_lr which generates from generate_esr_lr, all of them seem correct. I'm wondering if this color issue is caused by the anime videos I chose for the dataset? the anoraml img generated by generator like this:(GRL and cUNet are both models produce anormal color image) anormal_img

For your issue, I didn't face similar situation like this. Also, I don't recommand to use CuNet in the opt.py. That's a place holder. I used that long while ago but didn't delete it when I released the code. For CuNet, I will finalize my code recently (either delete them or correct it). For the GRL, I would still recommend you to check the dataset. Also, I wonder did you train from the beginning or train from the preitrained weights?

Thanks!

Thanks for your reply and suggestions. I used the grl pertrained weight from the modelzoo. The command I used was "python train_code/train.py --pretrained_path ./pretrained/4x_APISR_GRL_GAN_generator.pth" for the first stage training on my dataset. I will check my dataset to identify this issue.

Kiteretsu77 commented 5 months ago

I also meet this issue, but I recollected the dataset and eventually solved the issue. however, I encountered another issue. the geneator cannot generates the right color of input test image after 100k iterations first stage training. this problem also existed during the adversarial stage. I used finetune GRL and cUNet to train the model on my own dataset. I've checked images in train_hr, train_enhanced_hr, and train_lr which generates from generate_esr_lr, all of them seem correct. I'm wondering if this color issue is caused by the anime videos I chose for the dataset? the anoraml img generated by generator like this:(GRL and cUNet are both models produce anormal color image) anormal_img

For your issue, I didn't face similar situation like this. Also, I don't recommand to use CuNet in the opt.py. That's a place holder. I used that long while ago but didn't delete it when I released the code. For CuNet, I will finalize my code recently (either delete them or correct it). For the GRL, I would still recommend you to check the dataset. Also, I wonder did you train from the beginning or train from the preitrained weights? Thanks!

Thanks for your reply and suggestions. I used the grl pertrained weight from the modelzoo. The command I used was "python train_code/train.py --pretrained_path ./pretrained/4x_APISR_GRL_GAN_generator.pth" for the first stage training on my dataset. I will check my dataset to identify this issue.

Based on my SR trianing experience, I don't recommand to use GAN-trained weight to finetune L1 loss (I tried that it is ok to finetune GAN loss with GAN-trained weight). It will also be effective (get a decent result) if you directly train from L1 loss stage1 from raw.

alexaex commented 5 months ago

I also meet this issue, but I recollected the dataset and eventually solved the issue. however, I encountered another issue. the geneator cannot generates the right color of input test image after 100k iterations first stage training. this problem also existed during the adversarial stage. I used finetune GRL and cUNet to train the model on my own dataset. I've checked images in train_hr, train_enhanced_hr, and train_lr which generates from generate_esr_lr, all of them seem correct. I'm wondering if this color issue is caused by the anime videos I chose for the dataset? the anoraml img generated by generator like this:(GRL and cUNet are both models produce anormal color image) anormal_img

For your issue, I didn't face similar situation like this. Also, I don't recommand to use CuNet in the opt.py. That's a place holder. I used that long while ago but didn't delete it when I released the code. For CuNet, I will finalize my code recently (either delete them or correct it). For the GRL, I would still recommend you to check the dataset. Also, I wonder did you train from the beginning or train from the preitrained weights? Thanks!

Thanks for your reply and suggestions. I used the grl pertrained weight from the modelzoo. The command I used was "python train_code/train.py --pretrained_path ./pretrained/4x_APISR_GRL_GAN_generator.pth" for the first stage training on my dataset. I will check my dataset to identify this issue.

Based on my SR trianing experience, I don't recommand to use GAN-trained weight to finetune L1 loss (I tried that it is ok to finetune GAN loss with GAN-trained weight). It will also be effective (get a decent result) if you directly train from L1 loss stage1 from raw.

Taking your suggestions into account, I was able to successfully train the GRL model by adding a more diverse set of anime videos containing a wider range of colors based on finetune grl model from model zoo. Additionally, I am currently attempting to train the cUNet model to investigate whether it can perform better, as inferencing with GRL becomes challenging for GPU with less than 6GB of VRAM when working with 720p image resolutions.

1chuanchuan commented 5 months ago

我也遇到了这个问题,但我重新收集了数据集并最终解决了这个问题。但是,我遇到了另一个问题。在经过 100K 迭代第一阶段训练后,生成器无法生成正确颜色的输入测试图像。这个问题在对抗阶段也存在。我使用微调 GRL 和 cUNet 在我自己的数据集上训练模型。我检查了从generate_esr_lr生成的 train_hr、train_enhanced_hr 和 train_lr 中的图像,它们似乎都是正确的。我想知道这个颜色问题是否是由我为数据集选择的动漫视频引起的?生成器生成的 anoraml img 如下所示:(GRL 和 cUNet 都是产生异常彩色图像的模型) anormal_img

But I did run the project after re-collecting the dataset and then had this problem. Is it caused by too small a dataset?

alexaex commented 5 months ago

我也遇到了这个问题,但我重新收集了数据集并最终解决了这个问题。但是,我遇到了另一个问题。在经过 100K 迭代第一阶段训练后,生成器无法生成正确颜色的输入测试图像。这个问题在对抗阶段也存在。我使用微调 GRL 和 cUNet 在我自己的数据集上训练模型。我检查了从generate_esr_lr生成的 train_hr、train_enhanced_hr 和 train_lr 中的图像,它们似乎都是正确的。我想知道这个颜色问题是否是由我为数据集选择的动漫视频引起的?生成器生成的 anoraml img 如下所示:(GRL 和 cUNet 都是产生异常彩色图像的模型) anormal_img

But I did run the project after re-collecting the dataset and then had this problem. Is it caused by too small a dataset?

Can you provide some details about the platform that was used for training? In my investigation, I have found that this problem may only happen when using wsl(windows subsystem linux) or virtual machines like that for training. While training the model directly on Windows or Debian Linux, this will may not happen. I hope this can help you. the first image shows the model being trained directly on Windows, the latter image is from training on the wsl.

windows wsl

guyue562478107 commented 4 months ago

Hi brother, I'm running with this error: when I run the train.py file, there are multiple input folders and output folders in the tmp folder, but it reminds me that only one folder can exist.

This is strange
Process Process-1:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process
    self.H264_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
Process Process-4:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 96, in degradate_process
    self.MPEG2_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/mpeg2.py", line 50, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
This is strange
Process Process-6:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process
    self.H264_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
Process Process-5:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 93, in degradate_process
    self.H265_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h265.py", line 50, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
This is strange
Process Process-3:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process
    self.H264_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError
This is strange
Process Process-2:
Traceback (most recent call last):
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process
    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)
  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process
    self.H264_instance.compress_and_store(np_frame, store_path, process_id)
  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store
    assert(len(os.listdir(temp_store_path)) == 1)
AssertionError

Hello ,I have the same AssertionError, do you solve this error now

alexaex commented 4 months ago

Hi brother, I'm running with this error: when I run the train.py file, there are multiple input folders and output folders in the tmp folder, but it reminds me that only one folder can exist.


This is strange

Process Process-1:

Traceback (most recent call last):

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap

    self.run()

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run

    self._target(*self._args, **self._kwargs)

  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process

    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)

  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context

    return func(*args, **kwargs)

  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process

    self.H264_instance.compress_and_store(np_frame, store_path, process_id)

  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store

    assert(len(os.listdir(temp_store_path)) == 1)

AssertionError

Process Process-4:

Traceback (most recent call last):

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap

    self.run()

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run

    self._target(*self._args, **self._kwargs)

  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process

    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)

  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context

    return func(*args, **kwargs)

  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 96, in degradate_process

    self.MPEG2_instance.compress_and_store(np_frame, store_path, process_id)

  File "/root/autodl-tmp/APISR/degradation/video_compression/mpeg2.py", line 50, in compress_and_store

    assert(len(os.listdir(temp_store_path)) == 1)

AssertionError

This is strange

Process Process-6:

Traceback (most recent call last):

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap

    self.run()

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run

    self._target(*self._args, **self._kwargs)

  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process

    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)

  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context

    return func(*args, **kwargs)

  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process

    self.H264_instance.compress_and_store(np_frame, store_path, process_id)

  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store

    assert(len(os.listdir(temp_store_path)) == 1)

AssertionError

Process Process-5:

Traceback (most recent call last):

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap

    self.run()

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run

    self._target(*self._args, **self._kwargs)

  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process

    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)

  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context

    return func(*args, **kwargs)

  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 93, in degradate_process

    self.H265_instance.compress_and_store(np_frame, store_path, process_id)

  File "/root/autodl-tmp/APISR/degradation/video_compression/h265.py", line 50, in compress_and_store

    assert(len(os.listdir(temp_store_path)) == 1)

AssertionError

This is strange

Process Process-3:

Traceback (most recent call last):

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap

    self.run()

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run

    self._target(*self._args, **self._kwargs)

  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process

    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)

  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context

    return func(*args, **kwargs)

  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process

    self.H264_instance.compress_and_store(np_frame, store_path, process_id)

  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store

    assert(len(os.listdir(temp_store_path)) == 1)

AssertionError

This is strange

Process Process-2:

Traceback (most recent call last):

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap

    self.run()

  File "/root/miniconda3/envs/APISR/lib/python3.10/multiprocessing/process.py", line 108, in run

    self._target(*self._args, **self._kwargs)

  File "/root/autodl-tmp/APISR/scripts/generate_lr_esr.py", line 100, in single_process

    obj_img.degradate_process(out, opt, store_path, process_id, verbose = False)

  File "/root/miniconda3/envs/APISR/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context

    return func(*args, **kwargs)

  File "/root/autodl-tmp/APISR/degradation/degradation_esr.py", line 90, in degradate_process

    self.H264_instance.compress_and_store(np_frame, store_path, process_id)

  File "/root/autodl-tmp/APISR/degradation/video_compression/h264.py", line 52, in compress_and_store

    assert(len(os.listdir(temp_store_path)) == 1)

AssertionError

Hello ,I have the same AssertionError, do you solve this error now

yep, I solved this issue. The platform I used for training is Intel Xeon Platinum 8270, Nvidia RTX4090D and Debian.

HuaqingHe commented 2 months ago

Hi, I have met the same issue. At the beginning, I modified 'assert(len(os.listdir(temp_store_path)) == 1)' to: 'assert(len(os.listdir(temp_store_path)) > 0)'. This error was successfully avoided. I believe this is a problem caused by too much parallelism in the program. But when I modified it back to reproduce this situation, the error disappeared. Hope that helps. My training environment is RTX 3090, CUDA 11.8, Linux.

HuaqingHe commented 2 months ago

Hi, I have met the same issue. At the beginning, I modified 'assert(len(os.listdir(temp_store_path)) == 1)' to: 'assert(len(os.listdir(temp_store_path)) > 0)'. This error was successfully avoided. I believe this is a problem caused by too much parallelism in the program. But when I modified it back to reproduce this situation, the error disappeared. Hope that helps. My training environment is RTX 3090, CUDA 11.8, Linux.

I found it to be sporadic, when your CPU has enough processing it starts to process your process. It has little to do with the "parallel_num" in opt.py. Running python scrips/generate_lr_esr.py directly will have a much higher success rate. Then comment out line 354 of train_code/train_master.py can run smoothly. Of cause, you can't degradation for every epoch if you do so.

HuaqingHe commented 2 months ago

Finally, I changed the command "python" to "/opt/conda/APISR/bin/python". It worked for me. Specifically, I changed os.system("python scripts/generate_lr_esr.py") in line 354 of train_code/train_master.py to os.system("/opt /conda/envs/APISR/bin/python scripts/generate_lr_esr.py"). Also changed the train code python train_code/train.py to /opt/conda/envs/APISR/bin/python train_code/train.py.

alexaex commented 2 months ago

Hi, I have met the same issue. At the beginning, I modified 'assert(len(os.listdir(temp_store_path)) == 1)' to: 'assert(len(os.listdir(temp_store_path)) > 0)'. This error was successfully avoided. I believe this is a problem caused by too much parallelism in the program. But when I modified it back to reproduce this situation, the error disappeared. Hope that helps. My training environment is RTX 3090, CUDA 11.8, Linux.

I found it to be sporadic, when your CPU has enough processing it starts to process your process. It has little to do with the "parallel_num" in opt.py. Running python scrips/generate_lr_esr.py directly will have a much higher success rate. Then comment out line 354 of train_code/train_master.py can run smoothly. Of cause, you can't degradation for every epoch if you do so. I'm confused on too much parallelism. Can you make an explanation? Theoretically, the train_master.py will be blocked by the os.system(...) until it exits. so it shouldn't matter whether it runs degradation alone or with train_master.py. It's hard to understand. If you mean the parallelism in the degradation process, why not use a longer sleep function to wait for synchronization, or reduce the number of subprocesses? both of them might solve this I think. Additionally, the parallel_num parameter determines the number of subprocesses. changing it should help if your assumption about too much parallelism is correct I think.

HuaqingHe commented 2 months ago

Sorry for the confusing description, the too much parallelism means a big parallel_num. I tried to change it to 1. The bug still happened. Then I tried to sleep for 0.5. It still doesn't work.

---Original--- From: @.> Date: Thu, Jul 11, 2024 11:54 AM To: @.>; Cc: "Huaqing @.**@.>; Subject: Re: [Kiteretsu77/APISR] AssertionError (Issue #12)

Hi, I have met the same issue. At the beginning, I modified 'assert(len(os.listdir(temp_store_path)) == 1)' to: 'assert(len(os.listdir(temp_store_path)) > 0)'. This error was successfully avoided. I believe this is a problem caused by too much parallelism in the program. But when I modified it back to reproduce this situation, the error disappeared. Hope that helps. My training environment is RTX 3090, CUDA 11.8, Linux.

I found it to be sporadic, when your CPU has enough processing it starts to process your process. It has little to do with the "parallel_num" in opt.py. Running python scrips/generate_lr_esr.py directly will have a much higher success rate. Then comment out line 354 of train_code/train_master.py can run smoothly. Of cause, you can't degradation for every epoch if you do so. I'm confused on too much parallelism. Can you make an explanation? Theoretically, the train_master.py will be blocked by the os.system(...) until it exits. so it shouldn't matter whether it runs degradation alone or with train_master.py. It's hard to understand. If you mean the parallelism in the degradation process, why not use a longer sleep function to wait for synchronization, or reduce the number of subprocesses? both of them might solve this I think. Additionally, the parallel_num parameter determines the number of subprocesses. changing it should help if your assumption about too much parallelism is correct I think.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

HuaqingHe commented 2 months ago

Hi, I have met the same issue. At the beginning, I modified 'assert(len(os.listdir(temp_store_path)) == 1)' to: 'assert(len(os.listdir(temp_store_path)) > 0)'. This error was successfully avoided. I believe this is a problem caused by too much parallelism in the program. But when I modified it back to reproduce this situation, the error disappeared. Hope that helps. My training environment is RTX 3090, CUDA 11.8, Linux.

I found it to be sporadic, when your CPU has enough processing it starts to process your process. It has little to do with the "parallel_num" in opt.py. Running python scrips/generate_lr_esr.py directly will have a much higher success rate. Then comment out line 354 of train_code/train_master.py can run smoothly. Of cause, you can't degradation for every epoch if you do so. I'm confused on too much parallelism. Can you make an explanation? Theoretically, the train_master.py will be blocked by the os.system(...) until it exits. so it shouldn't matter whether it runs degradation alone or with train_master.py. It's hard to understand. If you mean the parallelism in the degradation process, why not use a longer sleep function to wait for synchronization, or reduce the number of subprocesses? both of them might solve this I think. Additionally, the parallel_num parameter determines the number of subprocesses. changing it should help if your assumption about too much parallelism is correct I think.

I think I solved this bug. Essentially, there is a problem with ffmpeg. Because the ffmpeg installation in conda env does not match. So we need apt install ffmpeg. Then run /opt/conda/envs/APISR/bin/python train_code/train.py in the base environment.