bdsp-core / IIIC-SPaRCNet

Other
11 stars 5 forks source link

About dataset #1

Open 1940653868 opened 6 months ago

1940653868 commented 6 months ago

First of all thank you very much for your open source related content. I would like to ask how should I get the dataset for reproducing your article, the site(https://bdsp.io/content/bdsp-sparcnet/1.1/) has suspended providing the full dataset. I would appreciate if you could reply.

MohSamNaf commented 3 months ago

Same issue here. Have you reached any solution?

1940653868 commented 2 months ago
微信图片_20240819164101

hope not too late

MohSamNaf commented 2 months ago

微信图片_20240819164101 hope not too late

Thanks for your reply I already have requested access. I have the same screen. I clicked on it again and it tells me "No New account to add" at the top of the page.

I still have Access denied when I try to access the Amazon S3 bucket. Do you have still have access to the dataset?

1940653868 commented 2 months ago

I get the “The name of the access point” from the E-mail the website sent me. 微信图片_20240826231611 But last time I downloaded the dataset is 3 months ago...

MohSamNaf commented 2 months ago

I have already clicked on the same link as shown in the image above.

This is the email I got this:

Sparcnet Access sparcnet_page

When I try to access the S3 Bucket or download using wget, I get these errors:

sparcnet_aws sparcnet_download

1940653868 commented 2 months ago

Have you got the key for aws following the link https://bdsp.io//about/howto_accessdata/ ? I am not sure why you failed to download……

MohSamNaf commented 2 months ago

Yes, I have got keys and configured it through the AWS terminal.

BDSP AWS Account

AWS Keys aws configure

May I ask if you can check if the data is still present in this Amazon Bucket? perhaps the data has been moved?

1940653868 commented 2 months ago

It seems that the data is still availiable. 2023-05-06 00:40:36 0 2023-05-06 00:45:08 2287360128 10_test_X.npy 2023-05-06 00:45:08 2319488128 10_train_X.npy 2023-05-06 00:45:10 217580 10_train_Y.npy 2023-05-06 00:45:11 1739744 10_train_Y2_hard.npy 2023-05-06 00:45:09 4349168 10_train_key.npy 2023-05-26 03:18:23 8604800128 all_train_X.npy 2023-05-06 00:45:13 806828 all_train_Y.npy 2023-05-06 00:45:14 6453728 all_train_Y2_hard.npy 2023-05-06 00:45:12 16134128 all_train_key.npy****

MohSamNaf commented 2 months ago

Thank you!! May I ask if you know an email or support desk that I may contact to solve my problem?

Additionally, Do you have how the segmentation of these files are related to Dataset 1, 2, 3, and 4 according to the supplementary document provided?

I noticed there are no labels for the test portion of the dataset

1940653868 commented 2 months ago

I'm not sure who you should contact, perhaps you could try contacting the corresponding author of this article. I don't know how to get the test lable, but you can ask other person who have used this dataset in their paper, like https://github.com/ycq091044/BIOT?tab=readme-ov-file

1940653868 commented 2 months ago

If you still cannot get the dataset, it might be possible to directly contact professor Westover or Jing Jin.

MohSamNaf commented 2 months ago

@1940653868 I was finally able to download the data. It was solved by creating a new account and doing the steps again. I am not sure what has changed.

Regarding the BIOT paper, ironically, it is the paper that lead me to this dataset. Based on Table 1 statistics and the "Data processing" section , I can assume they didn't use the "test" dataset for 2 reasons:

  1. From the "Dataset Processing" section, the mentioned they split the data into into training/validation/test sets using 60%/20%/20% ratio. For other datasets like TUAB and TUEV, it is mentioned that the training and test separation is provided by the dataset. If they utilized the "test" data provided by IIIC, I believe they would have mentioned it in the same way done with TUAB and TUEV.

  2. They utilized 165,309 samples. Utilizing the 1950 patients (134450 samples) + 761 patients (36242 samples) is roughly close at 170,190. However, those 1950 patients have 4448 recordings, while the 761 patients have 1743 recordings. So I have no idea how they got the 2,702.

Additionally, the IIIC paper mentioned that dataset 1 has 1950 patients with 110,095 samples, however the data uploaded has 134,450.