HakaiInstitute / hakai-datasets

Hakai Datasets that are going into https://catalogue.hakai.org/erddap/
0 stars 0 forks source link

Move seaspan datasets to datasets_development folder #95

Closed JessyBarrette closed 1 year ago

JessyBarrette commented 1 year ago

Based on some exchange with @steviewanders, it will be tricky on the servers end to be able to start relying on the ERDDAP authorization system at least in the near future.

Goose ERDDAP

Due to that, we will rely on the goose development erddap and its google authorization to serve the protected data (for now). Using this feature has a few downsides:

Suggestion: create a new datasets_development folder

We will keep the protected datasets available on Goose only and keep their associated dataset's xml in a dataset_development folder. Goose will concatenate the xmls in the datasets and datasets_development servers while production will only use the datasets folder.

That way, we can keep the two branches near each other and be sure that none of the protected datasets will ever make it to the public production server.

TO DO on servers

n-a-t-e commented 1 year ago

We are not able to retrieve the data through the ERDDAP api without manually retrieving an authorization. With ERDDAPs authorization system we could potentially avoid it with the custom passwords or https://coastwatch.pfeg.noaa.gov/erddap/download/AccessToPrivateDatasets.html. Since goose erddap right now isn't using the ERDDAP protection, we can't really use any suggestions present in the link above.

We can also just switch Goose to basic authentication, which is what we use with CIOOS Pacific. This allows you to make API calls easily, using a username and password. We can also add IPs to an allow-list (like we do in CIOOS) so that you don't need to use authentication at all in your script.

Development ERDDAP (goose) is using the development branch, while production erddap servers the main branch. This will bring the issue of either keeping some datasets only within the development branch or having datasets continuously failing in the production erddap.

This is how we've done it so far, both with Hakai and CIOOS Pacific. Eg currently there are 28 datasets on Goose and 20 on production

Change goose erddap script that concatenate the different datasets xml to consider also the folder dataset_development.

This is now built in to the docker image, see https://github.com/axiom-data-science/docker-erddap/pull/48

JessyBarrette commented 1 year ago

We can also just switch Goose to basic authentication, which is what we use with CIOOS Pacific. This allows you to make API calls easily, using a username and password. We can also add IPs to an allow-list (like we do in CIOOS) so that you don't need to use authentication at all in your script.

We could I don't have a preference, though specifc IP addresses could be annoying sometimes. I pull data from the cioospacific dev erddap server with great success, this can be helpful when developing the dataset and extracting more information than the ERDDAP can by itself.

This is how we've done it so far, both with Hakai and CIOOS Pacific. Eg currently there are 28 datasets on Goose and 20 on production.

Yes, but the ultimate goal is to have the 25 moved to production (I removed the non public ones from the count)

This is now built in to the docker image, see https://github.com/axiom-data-science/docker-erddap/pull/48 As far as I understand

As far as I understand, this is reproducing what we already have. What I'm suggesting is having two folders /datasets.d

Development harvest from the two directory, while production only from one.

JessyBarrette commented 1 year ago

This will get closed until we decide to add protected datasets