Closed allaffa closed 1 year ago
This PR aims at fixing a bug that was hidden inside the AbstractRawDataset
as a result of a transition from the old data management and the new data management methodologies of HydraGNN.
update_predicted_values
and update_atom_features
removed from AbstractRawDataset
class because these functions now are already included in the SimplePickleDataset
class.
Looks good to me. One comment about the feature update lines is that if we should keep them here and remove the ones in
SimplePickleDataset
since the data is loaded here?
@pzhanggit that's a good point. AbstractRawDataset is meant to be an "intermediate" layer between the AbstractBaseDataset class and any fully specified dataset class.
Therefore, for some specific datasets, as we already do for some examples such as for the ORNL_AISD-Ex dataset, we inherit directly from AbstractBaseDataset.
SimplePickleDataset
or AdiosDataset
are always used no matter what, because we will always need to convert the data from raw format into pickle or adios.
Therefore, to me it seems like keeping the feature update lines into SimplePickleDataset
and AdiosDataset
is more robust.
@pzhanggit @jychoi-hpc
I temporarily blocked this PR by converting it into draft.
I looked again at the implementation of AdiosDataset
.
I want to talk with Jong. I think that what I did is very practical for the SimplePickleDataset,
but not so sure for AdiosDataset
@jychoi-hpc @pzhanggit
I gave another thought to this PR.
Many examples not currently included in the main branch also use update_predicted_values
and update_atom_features
. Because of this, the most practical way to ensure that these two capabilities are always available consists maintaining them inside the SimplePickleDataset
class.
This PR addresses two issues:
self.__normalize_dataset()
is safeguarded by a boolean variable and used only when the user specifies it in the .JSON fileAbstractRawDataset
.update_predicted_values
andupdate_atom_features
are already included in the SimplePickleDataset class. Therefore, there is no need to used them now insideAbstractRawDataset