StanfordMIMI / skm-tea

Repository for the Stanford Knee MRI Multi-Task Evaluation (SKM-TEA) Dataset
MIT License
75 stars 15 forks source link

Register train, test, valid split #21

Closed aldiak closed 1 year ago

aldiak commented 1 year ago

Hi, I have a train, test, and validation split, so how do I register that to DataCatalog?

ad12 commented 1 year ago

thank for the question @aldiak! Just to make sure I understand:

  1. Do you want to register your train/val/test splits of a different dataset or the SKM-TEA dataset
  2. If these splits are for the SKM-TEA dataset, are you trying to register your own splits or the publicly available splits?
aldiak commented 1 year ago

Hello, I already have the data locally with the recon files in three folders (train, test and Val). The other are as downloaded (no split)

On Fri, Oct 21, 2022 at 10:01 Arjun Desai @.***> wrote:

thank for the question @aldiak https://github.com/aldiak! Just to make sure I understand:

  1. Do you want to register your train/val/test splits of a different dataset or the SKM-TEA dataset
  2. If these splits are for the SKM-TEA dataset, are you trying to register your own splits or the publicly available splits?

— Reply to this email directly, view it on GitHub https://github.com/StanfordMIMI/skm-tea/issues/21#issuecomment-1286360064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOVODVRLPOQ2A4J2QHVSULWEH2N5ANCNFSM6AAAAAARHTHUVA . You are receiving this because you were mentioned.Message ID: @.***>

aldiak commented 1 year ago

It’s actually SKM-Tea dataset

On Fri, Oct 21, 2022 at 11:44 Alou Diakite @.***> wrote:

Hello, I already have the data locally with the recon files in three folders (train, test and Val). The other are as downloaded (no split)

On Fri, Oct 21, 2022 at 10:01 Arjun Desai @.***> wrote:

thank for the question @aldiak https://github.com/aldiak! Just to make sure I understand:

  1. Do you want to register your train/val/test splits of a different dataset or the SKM-TEA dataset
  2. If these splits are for the SKM-TEA dataset, are you trying to register your own splits or the publicly available splits?

— Reply to this email directly, view it on GitHub https://github.com/StanfordMIMI/skm-tea/issues/21#issuecomment-1286360064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOVODVRLPOQ2A4J2QHVSULWEH2N5ANCNFSM6AAAAAARHTHUVA . You are receiving this because you were mentioned.Message ID: @.***>

aldiak commented 1 year ago

I registered, but got the following issue too.

On Fri, Oct 21, 2022 at 10:01 Arjun Desai @.***> wrote:

thank for the question @aldiak https://github.com/aldiak! Just to make sure I understand:

  1. Do you want to register your train/val/test splits of a different dataset or the SKM-TEA dataset
  2. If these splits are for the SKM-TEA dataset, are you trying to register your own splits or the publicly available splits?

— Reply to this email directly, view it on GitHub https://github.com/StanfordMIMI/skm-tea/issues/21#issuecomment-1286360064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOVODVRLPOQ2A4J2QHVSULWEH2N5ANCNFSM6AAAAAARHTHUVA . You are receiving this because you were mentioned.Message ID: @.***>

ad12 commented 1 year ago

The folder structure should not change - all files should stay in the folders they were in during download.

When you downloaded the data, you likely got a folder v1-release. Place this folder inside a parent folder called </path/to/dataset/directory>/skm-tea. </path/to/dataset/directory> can be any parent directory.

To make the skm-tea package auto-detect the dataset, add this to the top of your script/code:

import os
os.environ["MEDDLR_DATASETS_DIR"] = "</path/to/dataset/directory>"
aldiak commented 1 year ago

Alright, by doing so I do not need to change the paths in the function get_path defined in register.py file, right?

On Fri, Oct 21, 2022 at 22:38 Arjun Desai @.***> wrote:

The folder structure should not change - all files should stay in the folders they were in during download.

To make the skm-tea package auto-detect the dataset, add this to the top of your script/code:

import osos.environ["MEDDLR_DATASETS_DIR"] = "/path/to/dataset/directory"

"/path/to/dataset/directory" is the parent directory of the skm-tea folder you downloaded

— Reply to this email directly, view it on GitHub https://github.com/StanfordMIMI/skm-tea/issues/21#issuecomment-1287057211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOVODUYIUFRWYDEBTSE5VLWEKTFHANCNFSM6AAAAAARHTHUVA . You are receiving this because you were mentioned.Message ID: @.***>

ad12 commented 1 year ago

Yup, you shouldn't have to change any paths.

aldiak commented 1 year ago

Alright, I will let you know about the outcome. Thanks

On Sat, Oct 22, 2022 at 00:38 Arjun Desai @.***> wrote:

Yup, you shouldn't have to change any paths.

— Reply to this email directly, view it on GitHub https://github.com/StanfordMIMI/skm-tea/issues/21#issuecomment-1287199651, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOVODRCDSTSWE77T3LRKYLWELBINANCNFSM6AAAAAARHTHUVA . You are receiving this because you were mentioned.Message ID: @.***>

ad12 commented 1 year ago

Picking up the conversation from this thread

Overview

I don't believe this has to do with data registration - your datasets are loading fine:

2022-10-22 12:42:10,859 - Dropped 0 scans. 86 scans remaining
2022-10-22 12:42:10,859 - Dropped references for 0/86 scans. 86 scans with reference remaining
2022-10-22 12:42:11,496 - Loading D:/files_recon_calib-24/annotations/val.json takes 0.00 seconds
2022-10-22 12:42:13,993 - Formatting dataset dicts takes 2.49 seconds
2022-10-22 12:42:13,993 - Dropped 0 scans. 33 scans remaining
2022-10-22 12:42:13,993 - Dropped references for 0/33 scans. 33 scans with reference remaining

It looks like this is happening when running precompute_masks. You may have to precompute masks with cfg.DATALOADER.NUM_WORKERS=0. To do this, add DATALOADER.NUM_WORKERS 0 to the end of the script you are running:

python train_net.py <your-current-args> DATALOADER.NUM_WORKERS 0
aldiak commented 1 year ago

Hi, thank you the data is loading fine now

On Sat, Oct 22, 2022 at 22:39 Arjun Desai @.***> wrote:

Picking up the conversation from this thread https://github.com/ad12/meddlr/issues/67 Overview

I don't believe this has to do with data registration - your datasets are loading fine:

2022-10-22 12:42:10,859 - Dropped 0 scans. 86 scans remaining2022-10-22 12:42:10,859 - Dropped references for 0/86 scans. 86 scans with reference remaining2022-10-22 12:42:11,496 - Loading D:/files_recon_calib-24/annotations/val.json takes 0.00 seconds2022-10-22 12:42:13,993 - Formatting dataset dicts takes 2.49 seconds2022-10-22 12:42:13,993 - Dropped 0 scans. 33 scans remaining2022-10-22 12:42:13,993 - Dropped references for 0/33 scans. 33 scans with reference remaining

It looks like this is happening when running precompute_masks. You may have to precompute masks with cfg.DATALOADER.NUM_WORKERS=0. To do this, add DATALOADER.NUM_WORKERS 0 to the end of the script you are running:

python train_net.py DATALOADER.NUM_WORKERS 0

— Reply to this email directly, view it on GitHub https://github.com/StanfordMIMI/skm-tea/issues/21#issuecomment-1287812829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOVODWW3B32ZAYF3BVSDODWEP4ARANCNFSM6AAAAAARHTHUVA . You are receiving this because you were mentioned.Message ID: @.***>