Open alaniwi opened 5 years ago
Major +1 for this. This would be a great help for the Docker implementation.
@alaniwi @sashakames
I think we should try to maintain backward compatibility with the existing options if possible.
The only one of your proposed options that I'm not keen on is --thredds --no-tds-reload
. I also think we should stick to either thredds
or tds
.
What I think we should do is support the positive and negative invocations of your suggested flags, e.g.:
--[no-]root-catalog
--[no-]thredds-reload
Then --thredds-reinit
is basically an alias for --root-catalog --thredds-reload
.
Would that work?
@mkjpryor-stfc . Thanks, I think it would be easier to see if you could please write down your complete set of proposed invocations (standalone, and when publishing datasets), so that it is all written down in one place. (Don't worry about formatting it nicely if you want to quickly copy and paste the bits you aren't changing, as I can't see how to preserve the markdown when doing this.)
But note that I am extremely hesitant about including --no-root-catalog
. The only situation where I could imagine this might be valid (generate dataset catalog and then jump immediately to a reinit without generating a new root catalog) would be if regenerating a dataset catalog for an existing dataset and wishing to optimise by skipping generation of a root catalog that would be unchanged. This is a real edge case and I think it would be better just to say that in that situation people should do --no-thredds-reinit
followed by a separate --thredds-reload
, rather than support an option which would be problematic for users in most cases.
This being the case, what I would propose is either:
my original suggestion but with "thredds" instead of "tds",
my original suggestion but with "thredds" instead of "tds" and also the omission of the "--thredds --no-thredds-reload" so that if you want to do that (create dataset catalog and root catalog but not do the reload), then you have to do it in two steps using the available options, as I commented above. This would mean that my new options are both positive only; the only negative option would be the existing --no-thredds-reinit
.
Thanks.
@alaniwi
My use case is basically identical to yours - I want to be able to generate all the catalogs before copying them somewhere where they are accessible to THREDDS. So if it works for you it will work for me. Your proposal sounds reasonable.
@mkjpryor-stfc Which would you favour? Including --thredds --no-thredds-reload
or omitting it? I don't have a strong opinion about this, because it is not required for backward compatibility and the same functionality can be achieved in two steps (--thredds --no-thredds-reinit
followed by --root-catalog
).
@alaniwi I think it is fine to include it.
Okay so that leaves the following (my suggestion but with "thredds" instead of "tds"). Repeated here for convenience. @mkjpryor-stfc ok?
Standalone options that cannot be combined with other options:
--master-catalog
- only generate the master catalog [edited: was --root-catalog
]--thredds-reload
- only do the http reload call--thredds-reinit
- generate the root catalog and do the http call (existing behavior)Options used when publishing datasets:
--thredds
- create dataset catalogs and root catalog and do http call (existing behavior)--thredds --no-thredds-reinit
- only create dataset catalogs (existing behavior)--thredds --no-thredds-reload
- create dataset catalogs and root catalog but do not do http callAbout to make a pull request. I have used --master-catalog
instead of --root-catalog
for consistency with the terminology elsewhere in the publisher (and to avoid confusion with the true server root catalog in the parent directory), so will edit the immediately preceding comment accordingly.
If we can get some testing, then should be able to schedule a release to include this, working on an announcement to Slack
Currently the "THREDDS reinit" operation in the publisher (whether performed standalone using
esgpublish --thredds-reinit
or in conjuction with publishing dataset catalogs) does two things:creation of root catalog
http call to THREDDS to tell it to reload catalogs
(NB I am using "root catalog" here to refer to the "Earth System Root catalog" which lists all the datasets, rather than the very top-level one on the THREDDS server.)
In a situation where the catalogs written by the publisher are not directly visible by the THREDDS server but have to be pushed across (e.g. with rsync), this push has to be done in between the above two steps, because the new root catalog needs to be installed on the server before the reload is done. The workaround is to do the above sequence twice -- once before and once after the push -- but the consequence is that there are redundant steps in the procedure (server reload before any changes have been pushed, and later regenerate the same root catalog twice).
It would be good to support some command line flags which allow these operations to be invoked independently.
I would suggest the following backward-compatible options:
Standalone options that cannot be combined with other options:
--root-catalog
- only generate the root catalog--tds-reload
- only do the http reload call--thredds-reinit
- generate the root catalog and do the http call (existing behavior)Options used when publishing datasets:
--thredds
- create dataset catalogs and root catalog and do http call (existing behavior)--thredds --no-thredds-reinit
- only create dataset catalogs (existing behavior)--thredds --no-tds-reload
- create dataset catalogs and root catalog but do not do http callThe last of these is the least important because the same effect could be achieved using
--thredds --no-thredds-reinit
followed by--root-catalog
.While doing this, it might be good to avoid a mandatory requirement for
--project
in combination with the standalone options (currently--thredds-reinit
requires it -- it should not be required as there is no reason to load project-specific config for this operation, although it would still need to be permitted for backward compatibility). But again, this is not very important.