Implementing PAS on indepdendent dataset and own algorithm (K-FOLD CV, HPO integration)

LARS-research / PAS

The released code for the paper: Pooling Architecture Search for Graph Classification, in CIKM 2021.

23 stars 3 forks source link

Implementing PAS on indepdendent dataset and own algorithm (K-FOLD CV, HPO integration) #5

Open shrutiOx opened 4 months ago

shrutiOx commented 4 months ago

Hello,

Thank you for this great work. I want to implement PAS on a custom dataset in my own pipeline (K-FOLD CV, HPO integration). Could you please advice on how to do that.

Thanks!

shrutiOx commented 4 months ago

Hi,

I further downloaded your code and tried to implement this in my pipeline. It is not clear though where 'DARTS training' is invoked ? Is it in the model_search.py module ? Then we ask why is 'SANE' mentioned under 'model' params in args list. The second question is, it is not clear in case of implementing any custom data/ dataset not included in your experiment (e.g. ENZYMES dataset), how do we do that ? In args list, under 'data' param if we write 'ENZYMES', it is not clear if it will invoke that dataset. Moreover for custom dataset implementation, how do we do it ? So the questions would be 1)How do we implement ENZYMES dataset wiith your model (search-space and DARTS algorithm) ? 2) How do we implement the same for custom dataset ? 3) Why 'SANE' is given in args list, whereas algorithm is implementing 'DARTS' ?

Thanks a lot

wei-ln commented 4 months ago

Thank you for your attention.

The 'DARTS training' is implemented in train_search.py, and the mixed operations are provided in model_search.py. We need to search for an architecture within the supernet, and this can be accomplished by following Step 1 in the instructions (as mentioned in the README). The term 'SANE' mentioned in the 'model' is a bug (which is not used in the code), and we will address it.
New datasets can be added, and you should update the load_data function in dataset.py accordingly. The ENZYMES dataset can be utilized by specifying the argument ‘--data ENZYMES’ since it can be loaded using the torch_geometric.datasets.TUDataset function.

shrutiOx commented 4 months ago

Hi,

Thanks a lot for your kind reply. I will try to implement PAS as per your instructions and get back to you in case of any questions. Just to confirm, 1.Can be apply custom processed dataset (not standard torch geometric datasets) i.e. train and test (independent) on your current PAS implementation (with the code that you have shared)? In that case do we need to change dataset.py ?

wei-ln commented 4 months ago

New datasets (https://github.com/LARS-research/PAS/blob/main/dataset.py#L55) and splits (https://github.com/LARS-research/PAS/blob/main/dataset.py#L91) can be used in PAS by modifying the code correspondingly. The code is implemented based on PyG, and the non-standard processed data can be re-constructed and then used in PAS following the instructions https://pytorch-geometric.readthedocs.io/en/latest/get_started/introduction.html#data-handling-of-graphs

shrutiOx commented 4 months ago

Thanks so much for your prompt reply. I wanted to understand, how do I reproduce your model that has been trained say on a custom processed dataset cause reproducing models of DARTS are not straightforward normally. I need to train on a custom dataset and then put that model on a independent test dataset in a different process. So this is like transfer learning. Could you please let me know if your current code would allow me with this opportunity ?How will the model produced by PAS, be transferred on a independent dataset (suppose that we are saving the K-FOLD CV yielded model and later in a independent process/code trying to reproduce that on new data NOT doing any K-FOLD CV/training and just testing).