PaddlePaddle / PaddleSeg

Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc.
https://arxiv.org/abs/2101.06175
Apache License 2.0
8.67k stars 1.68k forks source link

KeyError: 'coronacases_org_001' #2956

Closed stillfighter2 closed 8 months ago

stillfighter2 commented 1 year ago

问题确认 Search before asking

请提出你的问题 Please ask your question

按照README.md,使用tools/prepare_lung_coronavirus.py一键数据预处理时,弹出了标题所示的错误,请问该如何处理这个error。感谢回复。

stillfighter2 commented 1 year ago

这个问题是出现在medicalseg中的

shiyutang commented 1 year ago

请提供完整的错误栈信息,看上去是处理数据是存放的键值和索引时不匹配导致的。需要进一步提供详细信息排出具体原因。

stillfighter2 commented 1 year ago

完整的错误信息如下:

D:\anaconda\envs\bisaione\python.exe D:/MedicalSeg/tools/prepare_lung_coronavirus.py preprocessing the images: 0%| | 0/20 [00:00<?, ?it/s] Traceback (most recent call last): File "D:/MedicalSeg/tools/prepare_lung_coronavirus.py", line 123, in prep.load_save() File "D:\MedicalSeg\tools\prepare.py", line 236, in load_save ".")[0]]["spacing"] if i == 0 else None KeyError: 'coronacases_org_001' raw_dataset_dir data/lung_coronavirus\lung_coronavirus_raw/ exists, skipping uncompress. To uncompress again, remove this directory Dataset json exists, skipping. Delete file data/lung_coronavirus\lung_coronavirus_raw/dataset.json to regenerate. Start convert images to numpy array using GPU, please wait patiently

代码使用的是最新版本的MedicalSeg,根据教程,使用python tools/prepare_lung_coronavirus.py对数据一键预处理,遇到了这个问题。

另外我在运行这个文件前更改了 tools/preprocess/init.py的第5行'tools/preprocess_globals.yml'更改为了'preprocess_globals.yml',因为我遇到了如下的错误。

Traceback (most recent call last): File "D:/MedicalSeg2/tools/prepare_lung_coronavirus.py", line 50, in from prepare import Prep File "D:\MedicalSeg2\tools\prepare.py", line 41, in from medicalseg.utils import get_image_list File "D:\MedicalSeg2\medicalseg__init.py", line 15, in from . import models, datasets, transforms, utils, inference_helpers File "D:\MedicalSeg2\medicalseg\models__init.py", line 20, in from .nnunet import NNUNet File "D:\MedicalSeg2\medicalseg\models\nnunet.py", line 37, in from tools.preprocess_utils import experiment_planner File "D:\MedicalSeg2\tools\init__.py", line 1, in from .prepare import Prep File "D:\MedicalSeg2\tools\prepare.py", line 42, in from tools.preprocess_utils import uncompressor, global_var, add_qform_sform File "D:\MedicalSeg2\tools\preprocess_utils\init__.py", line 5, in with codecs.open('tools/preprocess_globals.yml', 'r', 'utf-8') as file: File "D:\anaconda\envs\bisaione\lib\codecs.py", line 904, in open file = builtins.open(filename, mode, buffering) FileNotFoundError: [Errno 2] No such file or directory: 'tools/preprocess_globals.yml'

更改该路径后程序成功的下载了压缩包,但是又遇到了下面的问题:

Traceback (most recent call last): File "D:/MedicalSeg2/tools/prepare_lung_coronavirus.py", line 113, in prep = Prep_lung_coronavirus() File "D:/MedicalSeg2/tools/prepare_lung_coronavirus.py", line 78, in init "num_files": 4}) File "D:\MedicalSeg2\tools\prepare.py", line 128, in init filter_key[0]) File "D:\MedicalSeg2\medicalseg\utils\utils.py", line 196, in get_image_list format(image_path)) FileNotFoundError: data/lung_coronavirus\lung_coronavirus_raw/20_ncov_scan is not found. it should be a path of image, or a directory including images.

我发现该问题是因为压缩后路径出现了问题,然后我又调整了路径,调整后的路径如下图

image

处理完上述的两个报错之后就遇到了如标题所示的[KeyError: 'coronacases_org_001']

请提供完整的错误栈信息,看上去是处理数据是存放的键值和索引时不匹配导致的。需要进一步提供详细信息排出具体原因。

请提供完整的错误栈信息,看上去是处理数据是存放的键值和索引时不匹配导致的。需要进一步提供详细信息排出具体原因。

stillfighter2 commented 1 year ago

完整的错误信息如下:

D:\anaconda\envs\bisaione\python.exe D:/MedicalSeg/tools/prepare_lung_coronavirus.py preprocessing the images: 0%| | 0/20 [00:00<?, ?it/s] Traceback (most recent call last): File "D:/MedicalSeg/tools/prepare_lung_coronavirus.py", line 123, in prep.load_save() File "D:\MedicalSeg\tools\prepare.py", line 236, in load_save ".")[0]]["spacing"] if i == 0 else None KeyError: 'coronacases_org_001' raw_dataset_dir data/lung_coronavirus\lung_coronavirus_raw/ exists, skipping uncompress. To uncompress again, remove this directory Dataset json exists, skipping. Delete file data/lung_coronavirus\lung_coronavirus_raw/dataset.json to regenerate. Start convert images to numpy array using GPU, please wait patiently

代码使用的是最新版本的MedicalSeg,根据教程,使用python tools/prepare_lung_coronavirus.py对数据一键预处理,遇到了这个问题。

另外我在运行这个文件前更改了 tools/preprocess/init.py的第5行'tools/preprocess_globals.yml'更改为了'preprocess_globals.yml',因为我遇到了如下的错误。

Traceback (most recent call last): File "D:/MedicalSeg2/tools/prepare_lung_coronavirus.py", line 50, in from prepare import Prep File "D:\MedicalSeg2\tools\prepare.py", line 41, in from medicalseg.utils import get_image_list File "D:\MedicalSeg2\medicalseginit.py", line 15, in from . import models, datasets, transforms, utils, inference_helpers File "D:\MedicalSeg2\medicalseg\modelsinit.py", line 20, in from .nnunet import NNUNet File "D:\MedicalSeg2\medicalseg\models\nnunet.py", line 37, in from tools.preprocess_utils import experiment_planner File "D:\MedicalSeg2\toolsinit.py", line 1, in from .prepare import Prep File "D:\MedicalSeg2\tools\prepare.py", line 42, in from tools.preprocess_utils import uncompressor, global_var, add_qform_sform File "D:\MedicalSeg2\tools\preprocess_utilsinit.py", line 5, in with codecs.open('tools/preprocess_globals.yml', 'r', 'utf-8') as file: File "D:\anaconda\envs\bisaione\lib\codecs.py", line 904, in open file = builtins.open(filename, mode, buffering) FileNotFoundError: [Errno 2] No such file or directory: 'tools/preprocess_globals.yml'

更改该路径后程序成功的下载了压缩包,但是又遇到了下面的问题:

Traceback (most recent call last): File "D:/MedicalSeg2/tools/prepare_lung_coronavirus.py", line 113, in prep = Prep_lung_coronavirus() File "D:/MedicalSeg2/tools/prepare_lung_coronavirus.py", line 78, in init "num_files": 4}) File "D:\MedicalSeg2\tools\prepare.py", line 128, in init filter_key[0]) File "D:\MedicalSeg2\medicalseg\utils\utils.py", line 196, in get_image_list format(image_path)) FileNotFoundError: data/lung_coronavirus\lung_coronavirus_raw/20_ncov_scan is not found. it should be a path of image, or a directory including images.

我发现该问题是因为压缩后路径出现了问题,然后我又调整了路径,调整后的路径如下图

处理完上述的两个报错之后就遇到了如标题所示的[KeyError: 'coronacases_org_001']

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年2月7日(星期二) 中午11:57 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [PaddlePaddle/PaddleSeg] KeyError: 'coronacases_org_001' (Issue #2956)

请提供完整的错误栈信息,看上去是处理数据是存放的键值和索引时不匹配导致的。

- 直接回复此电子邮件,在 GitHub 上查看或取消订阅。 您收到此消息是因为您创作了线程。Message ID: @.***>

shiyutang commented 1 year ago

这个报错是因为在dataset_json_dict 的traning 部分没有找到这个数据,建议确认:1.是否运行了generate_dataset_json

image

2.查看数据路径下生成dataset.json的training部分的索引关键字是什么,应该是文件名去除后缀的部分。

如果生成的dataset.json不正确,可以删除dataset.json,并尝试重新运行generate_dataset_json生成,并检查数据名称是否正确。

stillfighter2 commented 1 year ago

对于您的第一条建议,我想你说的generate_dataset_json应该是prepare文件,这个文件会创造出dataset_json,这个文件被prepare_lung_coronavirus这个一键处理数据集文件所引用。 对于您的第二条建议,下面是我的dataset.json文件,我检查了一下文件路径正如下图所示没有问题。

还能帮助我看看哪里出现问题了吗

------------------ 原始邮件 ------------------ 发件人: "PaddlePaddle/PaddleSeg" @.>; 发送时间: 2023年2月7日(星期二) 下午4:07 @.>; @.**@.>; 主题: Re: [PaddlePaddle/PaddleSeg] KeyError: 'coronacases_org_001' (Issue #2956)

这个报错是因为在dataset_json_dict 的traning 部分没有找到这个数据,建议确认:1.是否运行了generate_dataset_json 2.查看数据路径下生成dataset.json的training部分的索引关键字是什么,应该是文件名去除后缀的部分。如果生成的dataset.json不正确,可以删除dataset.json,并尝试重新运行generate_dataset_json生成,并检查数据名称是否正确。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

shiyutang commented 1 year ago

我没有看到你的dataset.json的截图,我看下截图对应位置是否有对应键值呢

stillfighter2 commented 1 year ago

------------------ 原始邮件 ------------------ 发件人: "PaddlePaddle/PaddleSeg" @.>; 发送时间: 2023年2月7日(星期二) 下午5:24 @.>; @.**@.>; 主题: Re: [PaddlePaddle/PaddleSeg] KeyError: 'coronacases_org_001' (Issue #2956)

我没有看到你的dataset.json的截图,我看下截图对应位置是否有对应键值呢

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

shiyutang commented 1 year ago

似乎你通过邮件发送没有办法发送图片,建议直接回复issue:https://github.com/PaddlePaddle/PaddleSeg/issues/2956

stillfighter2 commented 1 year ago

似乎你通过邮件发送没有办法发送图片,建议直接回复issue:#2956 sry,sir. image

shiyutang commented 1 year ago

可以看出你的key值的相对位置有问题,应该是coronacases_org_001,而不是带有20_nconv_scan\20_nconv_scan 查看代码后发现是windows路径兼容问题,建议拉取 https://github.com/PaddlePaddle/PaddleSeg/pull/2960 后删除dataset.json,并重新生成dataset.json 运行。

stillfighter2 commented 1 year ago

可以看出你的key值的相对位置有问题,应该是coronacases_org_001,而不是带有20_nconv_scan\20_nconv_scan 查看代码后发现是windows路径兼容问题,建议拉取 #2960 后删除dataset.json,并重新生成dataset.json 运行。

非常感谢,不过 image json_dict['training'][os.path.split(image_name).split(".")[ 这一行似乎应该更改为 json_dict['training'][os.path.split(image_name)[1].split(".")[ 在完成这些更改后我检查了dataset.json文件,如下图所示 image 不过我尝试再次运行数据集预处理程序时,出现了下面的错误 image

stillfighter2 commented 1 year ago

可以看出你的key值的相对位置有问题,应该是coronacases_org_001,而不是带有20_nconv_scan\20_nconv_scan 查看代码后发现是windows路径兼容问题,建议拉取 #2960 后删除dataset.json,并重新生成dataset.json 运行。

我更改cpu运行成功了,难道是CUPY的版本不行吗

shiyutang commented 1 year ago

这个问题是不同cupy版本支持数据类型的问题,可以尝试修改tuple为list验证是否可以。

stillfighter2 commented 1 year ago

这个问题是不同cupy版本支持数据类型的问题,可以尝试修改tuple为list验证是否可以。

非常感谢,我已经搞定了这个问题, 另外请问一下,train.py的默认配置文件没有更改num_workers并且,它的默认值为0,为什么我运行这个文件时还是提醒我,windows和macos不支持多线程呢,我并没有更改配置文件的路径,您能告诉我,需要在哪个文件进行更改吗,再次感谢。 image

shiyutang commented 1 year ago

我观察确实没有其他修改num_workers的地方,因此保险起见建议在下图中使用import pdb;pdb.set_trace() 打断点查看num_workers 的具体数值,如果也为0,则可以忽略这个warning,说明不是这个warning导致的问题,需要查看其他导致训练终止的原因: image

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.