ArneBinder / pytorch-ie-hydra-template-1

PyTorch-IE Hydra Template
8 stars 1 forks source link

More possible improvements #177

Open tanikina opened 5 days ago

tanikina commented 5 days ago

Just wanted to suggest some potential improvements based on my experience with the current template:

  1. Microseconds can be added to the directory names to avoid collisions (I once had an unfortunate situation when two jobs started at the same time on the cluster and there was a single folder for two different models). I think it should be enough to add ${now:%H-%M-%S-%f} in configs/hydra/default.yaml: https://github.com/ArneBinder/pytorch-ie-hydra-template-1/blob/3b37839332173420cff8cc43507cb994931e8831/configs/hydra/default.yaml#L19-L22 and in configs/train.yaml: https://github.com/ArneBinder/pytorch-ie-hydra-template-1/blob/3b37839332173420cff8cc43507cb994931e8831/configs/train.yaml#L73-L74

  2. The column names in the output md files can be sorted, this would make it easier to compare the results from different runs and experiments in the log file. I typically copy the results from job_return_value.md or job_return_value.aggregated.md and the columns in these files often appear in a different order. I think sorting columns could be done in src/hydra_callbacks/save_job_return_value.py by adding something like this:

      if isinstance(result, pd.DataFrame):
          result = result.reindex(sorted(result.columns), axis=1)
      elif isinstance(result, pd.Series):
          result = result.sort_index()

    before the result is written into file: https://github.com/ArneBinder/pytorch-ie-hydra-template-1/blob/3b37839332173420cff8cc43507cb994931e8831/src/hydra_callbacks/save_job_return_value.py#L257-L258