Tracking dataset preprocessing step

espositoandrea commented 2 years ago

At the moment, we just uploaded the final pre-processed folds using DVC. We should port the pipeline as Python Scripts and provide the "raw" data that produced the final folds. Since the dataset is huge (over 2TB), we're going to just show a very small subset, as a "proof-of-concept" of the pipeline. We'll then simply apply that same pipeline using more data.

espositoandrea commented 2 years ago

Due to some issues, we had to restart the work we've done related to this issue. I'll be taking over this ticket.

espositoandrea commented 2 years ago

Due to the large size of the dataset, I'll be uploading only 14 positive PET scans and 20 negative PET scans based on the following IDs:

PET Scan's ID	Class
OAS30027_PIB_PUPTIMECOURSE_d1300	Negative
OAS30132_PIB_PUPTIMECOURSE_d0063	Negative
OAS30206_AV45_PUPTIMECOURSE_d3024	Negative
OAS30273_PIB_PUPTIMECOURSE_d0077	Negative
OAS30479_AV45_PUPTIMECOURSE_d2421	Negative
OAS30662_PIB_PUPTIMECOURSE_d1615	Negative
OAS30687_PIB_PUPTIMECOURSE_d0126	Negative
OAS30713_PIB_PUPTIMECOURSE_d0095	Negative
OAS30713_PIB_PUPTIMECOURSE_d1692	Negative
OAS30818_AV45_PUPTIMECOURSE_d1720	Negative
OAS30818_AV45_PUPTIMECOURSE_d2089	Negative
OAS30818_PIB_PUPTIMECOURSE_d0097	Negative
OAS30818_PIB_PUPTIMECOURSE_d1214	Negative
OAS30863_PIB_PUPTIMECOURSE_d1531	Negative
OAS30867_AV45_PUPTIMECOURSE_d4407	Negative
OAS30867_PIB_PUPTIMECOURSE_d0480	Negative
OAS30869_PIB_PUPTIMECOURSE_d0152	Negative
OAS30899_PIB_PUPTIMECOURSE_d0070	Negative
OAS30964_PIB_PUPTIMECOURSE_d1142	Negative
OAS30964_PIB_PUPTIMECOURSE_d1533	Negative
OAS30024_AV45_PUPTIMECOURSE_d0084	Positive
OAS30027_PIB_PUPTIMECOURSE_d2394	Positive
OAS30031_PIB_PUPTIMECOURSE_d0236	Positive
OAS30035_PIB_PUPTIMECOURSE_d3893	Positive
OAS30040_PIB_PUPTIMECOURSE_d4424	Positive
OAS30051_PIB_PUPTIMECOURSE_d0081	Positive
OAS30078_PIB_PUPTIMECOURSE_d0136	Positive
OAS30085_PIB_PUPTIMECOURSE_d1566	Positive
OAS30087_PIB_PUPTIMECOURSE_d0096	Positive
OAS30114_AV45_PUPTIMECOURSE_d0086	Positive
OAS30119_PIB_PUPTIMECOURSE_d1615	Positive
OAS30119_PIB_PUPTIMECOURSE_d2595	Positive
OAS30119_PIB_PUPTIMECOURSE_d3722	Positive
OAS30128_AV45_PUPTIMECOURSE_d0044	Positive

espositoandrea commented 2 years ago

As we've presented to the Professors today (2021-11-04), we have successfully tracked the data-creation and data-processing pipeline. We should try and merge the DVC data-pipeline and the MLflow model-tracking, but we can think about it on a new ticket and eventually work on a different PR.

espositoandrea / Dementia-Detection

Tracking dataset preprocessing step #5