ESGF / esg-publisher

ESGF Publisher
http://esg-publisher.readthedocs.org/
9 stars 22 forks source link

support separate creation of top level catalog and THREDDS server reinit #125

Open alaniwi opened 5 years ago

alaniwi commented 5 years ago

Currently the "THREDDS reinit" operation in the publisher (whether performed standalone using esgpublish --thredds-reinit or in conjuction with publishing dataset catalogs) does two things:

(NB I am using "root catalog" here to refer to the "Earth System Root catalog" which lists all the datasets, rather than the very top-level one on the THREDDS server.)

In a situation where the catalogs written by the publisher are not directly visible by the THREDDS server but have to be pushed across (e.g. with rsync), this push has to be done in between the above two steps, because the new root catalog needs to be installed on the server before the reload is done. The workaround is to do the above sequence twice -- once before and once after the push -- but the consequence is that there are redundant steps in the procedure (server reload before any changes have been pushed, and later regenerate the same root catalog twice).

It would be good to support some command line flags which allow these operations to be invoked independently.

I would suggest the following backward-compatible options:

The last of these is the least important because the same effect could be achieved using --thredds --no-thredds-reinit followed by --root-catalog.

While doing this, it might be good to avoid a mandatory requirement for --project in combination with the standalone options (currently --thredds-reinit requires it -- it should not be required as there is no reason to load project-specific config for this operation, although it would still need to be permitted for backward compatibility). But again, this is not very important.

mkjpryor-stfc commented 5 years ago

Major +1 for this. This would be a great help for the Docker implementation.

mkjpryor-stfc commented 4 years ago

@alaniwi @sashakames

I think we should try to maintain backward compatibility with the existing options if possible.

The only one of your proposed options that I'm not keen on is --thredds --no-tds-reload. I also think we should stick to either thredds or tds.

What I think we should do is support the positive and negative invocations of your suggested flags, e.g.:

Then --thredds-reinit is basically an alias for --root-catalog --thredds-reload.

Would that work?

alaniwi commented 4 years ago

@mkjpryor-stfc . Thanks, I think it would be easier to see if you could please write down your complete set of proposed invocations (standalone, and when publishing datasets), so that it is all written down in one place. (Don't worry about formatting it nicely if you want to quickly copy and paste the bits you aren't changing, as I can't see how to preserve the markdown when doing this.)

But note that I am extremely hesitant about including --no-root-catalog. The only situation where I could imagine this might be valid (generate dataset catalog and then jump immediately to a reinit without generating a new root catalog) would be if regenerating a dataset catalog for an existing dataset and wishing to optimise by skipping generation of a root catalog that would be unchanged. This is a real edge case and I think it would be better just to say that in that situation people should do --no-thredds-reinit followed by a separate --thredds-reload, rather than support an option which would be problematic for users in most cases.

This being the case, what I would propose is either:

Thanks.

mkjpryor-stfc commented 4 years ago

@alaniwi

My use case is basically identical to yours - I want to be able to generate all the catalogs before copying them somewhere where they are accessible to THREDDS. So if it works for you it will work for me. Your proposal sounds reasonable.

alaniwi commented 4 years ago

@mkjpryor-stfc Which would you favour? Including --thredds --no-thredds-reload or omitting it? I don't have a strong opinion about this, because it is not required for backward compatibility and the same functionality can be achieved in two steps (--thredds --no-thredds-reinit followed by --root-catalog).

mkjpryor-stfc commented 4 years ago

@alaniwi I think it is fine to include it.

alaniwi commented 4 years ago

Okay so that leaves the following (my suggestion but with "thredds" instead of "tds"). Repeated here for convenience. @mkjpryor-stfc ok?

Standalone options that cannot be combined with other options:

Options used when publishing datasets:

alaniwi commented 4 years ago

About to make a pull request. I have used --master-catalog instead of --root-catalog for consistency with the terminology elsewhere in the publisher (and to avoid confusion with the true server root catalog in the parent directory), so will edit the immediately preceding comment accordingly.

sashakames commented 4 years ago

If we can get some testing, then should be able to schedule a release to include this, working on an announcement to Slack