Closed aalbino2 closed 3 months ago
I think we should define some meaningful string(s) and refs to fully understand the provenance of any data coming from an analysis. We had the idea of trying source=Quantity(type=MEnum('simulation', 'measurement', 'analysis'))
, so maybe this could be used or repurpose for this? What do you have in mind?
@JosePizarro3 I think this is related to the Jupyter notebook that @aalbino2 and I are developing for Ta-Shun's ML use case in Physical Vapor Deposition. This particular issue should not bring any changes to the code quality of this repo.
On the other hand, I am happy to explore the idea of source
quantity and have opened a separate issue for this.
@aalbino2 The splitting of the generated DataFrame into train and test DataFrames is already happening in the sklearn.model_selection.train_test_split
function that we use in the notebook. Do you mean something different than this?
as a next step we need to randomly split our features and labels in training and test sets. I think this is possible after generating the dataframe that is injested in the ML code