abstractrawdataset updated

ORNL / HydraGNN

Distributed PyTorch implementation of multi-headed graph convolutional neural networks

BSD 3-Clause "New" or "Revised" License

68 stars 29 forks source link

abstractrawdataset updated #202

Closed allaffa closed 1 year ago

allaffa commented 1 year ago

This PR addresses two issues:

self.__normalize_dataset() is safeguarded by a boolean variable and used only when the user specifies it in the .JSON file
fixes a bug that was hidden inside the AbstractRawDataset . update_predicted_values and update_atom_features are already included in the SimplePickleDataset class. Therefore, there is no need to used them now inside AbstractRawDataset

allaffa commented 1 year ago

This PR aims at fixing a bug that was hidden inside the AbstractRawDataset as a result of a transition from the old data management and the new data management methodologies of HydraGNN. update_predicted_values and update_atom_features removed from AbstractRawDataset class because these functions now are already included in the SimplePickleDataset class.

allaffa commented 1 year ago

Looks good to me. One comment about the feature update lines is that if we should keep them here and remove the ones in SimplePickleDataset since the data is loaded here?

@pzhanggit that's a good point. AbstractRawDataset is meant to be an "intermediate" layer between the AbstractBaseDataset class and any fully specified dataset class. Therefore, for some specific datasets, as we already do for some examples such as for the ORNL_AISD-Ex dataset, we inherit directly from AbstractBaseDataset. SimplePickleDatasetor AdiosDataset are always used no matter what, because we will always need to convert the data from raw format into pickle or adios.

Therefore, to me it seems like keeping the feature update lines into SimplePickleDataset and AdiosDataset is more robust.

allaffa commented 1 year ago

@pzhanggit @jychoi-hpc I temporarily blocked this PR by converting it into draft. I looked again at the implementation of AdiosDataset. I want to talk with Jong. I think that what I did is very practical for the SimplePickleDataset, but not so sure for AdiosDataset

allaffa commented 1 year ago

@jychoi-hpc @pzhanggit I gave another thought to this PR. Many examples not currently included in the main branch also use update_predicted_values and update_atom_features . Because of this, the most practical way to ensure that these two capabilities are always available consists maintaining them inside the SimplePickleDataset class.