Training the model using 10% of the dataset

Wimukti commented 1 year ago

@farrell236 Hey farrell, Since the dataset is too large can I train the model using only 10% of the dataset? (Currently I don't have enough computational power and other resources to train the model with whole MIMIC-CXR dataset of size ~500GB). If that is possible what are the steps, I need to follow to successfully train the model with only 10% of the dataset. As far as I understand, I may need to,

Create the MIMIC_AP_PA_train.csv with only 10% of the original train dataset
Create both mimic-merges.txt & mimic-vocab.json from the (10% train + test + validate) data

Other than these do I have to follow any other steps?

Also, instead of DenseNet-121 can't we use something like MobileNetV2 for the encoder?

farrell236 commented 1 year ago

Hi @Wimukti, sorry for replying so late, I hope you've managed to solve it since!

You'll need to modify the file here as its responsible for creating the file MIMIC_AP_PA_train.csv. Either disable the sanity check and then delete rows from the created CSV, or delete rows in the pandas dataframe before it enters the for loop.

You would then need to also delete the corresponding rows of studies from mimic_cxr_labeled.csv because creating mimic-merges.txt & mimic-vocab.json reads all from this file.

Hope it helps!

Wimukti commented 1 year ago

Hi @farrell236 , Thank you so much for the reply. I was managed to solve the issue following the same steps as you have mentioned. Thank you so much for your assistance, it was much appreciated.

akashAD98 commented 11 months ago

@Wimukti are you able to train the model? can you provide the sample of data?

farrell236 / RATCHET

Training the model using 10% of the dataset #3