NaturalHistoryMuseum / scratchpads2

Scratchpads 2.0
http://scratchpads.org
GNU General Public License v2.0
199 stars 83 forks source link

Routine task (not bug): update the Darwin Core Archive for Myriatrix (Aim for before end 26 Aug) #6583

Open therobyouknow opened 2 years ago

therobyouknow commented 2 years ago

Hi Rob, Would you be so kind as to update the Darwin Core Archive for Myriatrix before you leave for holidays next week? Please, see the email thread below. Kind regards, Carlos

therobyouknow commented 2 years ago

Working on this...

Enabling "DarwinCore Archive (DwC-A) export" module

image

Archilegt commented 2 years ago

Related:

6373

6410

To check: Updated Myriatrix DwC-Archive should form at: https://myriatrix.myspecies.info/gbif-dwca.zip @ManonGros may be able to run this archive through the GBIF IPT validator and give feedback to me and @therobyouknow

On Mon, Jun 7, 2021 at 3:53 PM Ben Scott [b.scott@nhm.ac.uk](mailto:b.scott@nhm.ac.uk) wrote:

I’ve also updated the permissions on your Scratchpad so should you want to rebuild the archive again you can visit http://myriatrix.myspecies.info/#overlay=admin/config/content/dwcarchiver and click the link “Rebuild” next to GBIF DwCA.

I have tried it on http and https and it is not working on my side. If it becomes functional, I could trigger DwC-A updates at will and it would be easier to send regular updates to GBIF.

Archilegt commented 2 years ago

Just to be clear: Fixing formation of DwC-A via admin/config/content/dwcarchiver is not a topic for today. It is an important thing to do but it can wait and it can be tracked on a separate issue.

therobyouknow commented 2 years ago

Thank you @Archilegt

The "DarwinCore Archive (DwC-A) export" module is now enabled

And there is now a facility to download, to your machine, 2 .zip files - see this URL when logged in as admin: https://myriatrix.myspecies.info/admin/config/content/dwcarchiver/list

image

Can you access this to download? If not I will attempt to perform the downloads by clicking on the download links. Do you know how to make the downloaded zip files available to DarwinCore Archive?

Later on I would be looking to update the training with information about how to do it. https://scratchpads.readthedocs.io/en/latest/index.html

benscott commented 2 years ago

Hi @Archilegt - I've just rebuilt the DwC-A. Can you downlod the latest version now?

Archilegt commented 2 years ago

It may be a temporary glitch but both https://myriatrix.myspecies.info/#overlay=admin/config/content/dwcarchiver and https://myriatrix.myspecies.info/admin/config/content/dwcarchiver/list give me "Access denied". I have two GBIF/DwC-A related options (DwC-Archive settings and GBIF registration settings) in Content but not DwC-Archiver and consequently I cannot access "List" within it. image

benscott commented 2 years ago

I'll talk to Rob now and we'll sort out the access permissions - in the meantime the latest DwC-A is available at https://myriatrix.myspecies.info/gbif-dwca.zip

Archilegt commented 2 years ago

Many thanks, @benscott and @therobyouknow! I already see the file size change, from 2.0 to 2.1 Mb.

Archilegt commented 2 years ago

@dimus, please let me know if you could work directly with the archive at https://myriatrix.myspecies.info/gbif-dwca.zip

dimus commented 2 years ago

@Archilegt the DwCA file looks fine, however I will know if it can be imported some time later. We had a network shutdown in July and the machine I used for imports seem to be dead. I can either try to revive it physically when I am back to Illinois, or figure out a new place for imports.

dimus commented 2 years ago

@Archilegt Myriatrix is updated: https://verifier.globalnames.org/data_sources/193

Archilegt commented 2 years ago

Follow up: @benscott, @therobyouknow, please let me know when I have access to the tool for triggering formation of the DwC-A. I have made the corrections to chained synonyms indicated by Marie Grosjean (GBIF) and she could import the new file. Also, the chained synonyms are preventing @dimus from recovering a tree hierarchy for them at the Global Names Architecture.

therobyouknow commented 2 years ago

please let me know when I have access to the tool for triggering formation of the DwC-A

I think the following permissions are needed to access:

DarwinCore Archive (DwC-A) export

DarwinCore Archiver (DwC-A)

see also screenshot below:

@benscott are these the permissions @Archilegt needs?

admin/people/permissions

image

benscott commented 2 years ago

I don't know offhand - but @therobyouknow if you use the logins you can test it if it works for any particular user.

Alongside the permissions, the DWC-A file will need to be cleared from the varnish cache, otherwise the cached file will continue to be served. Probably the easiest way to do that is to just clear the entire site when the download is rebuilt.

Archilegt commented 2 years ago

Additional info: I don't have access to the path https://myriatrix.myspecies.info/admin/people/permissions That would have to be enabled. I only have access to subpermissions via https://myriatrix.myspecies.info/admin/people/subpermissions

benscott commented 2 years ago

Hi @Archilegt, yes the core permissions are maintained by the Scratchpad dev team, and @therobyouknow will adjust them to grant you access to the DwC-A controls.

therobyouknow commented 2 years ago

Good morning @Archilegt

I have enabled those permissions for maintainers. Your user has the maintainer role so these permissions will apply to your user.

image

Which now means that the following admin pages for DWC-A would be available to you: https://myriatrix.myspecies.info/admin/config/content/dwca https://myriatrix.myspecies.info/admin/config/content/dwcarchiver

Please let me know if these are what you need. I will be happy to help further in any case.

therobyouknow commented 1 year ago

@Archilegt to check.

Archilegt commented 1 year ago

Hi, @therobyouknow! I confirm that I now have access to both links and that I can trigger the operations Preview, Rebuild, and Download. I did not try to test Override because I don't know what is the expected result. What about the varnish issue that @benscott mentioned? Is it still a problem?

therobyouknow commented 1 year ago

ongoing analysis, planning and estimation for solution to stop caching of older .zip downloads

Remaining work as per varnish issue.

Code needs implementing to clear varnish cache when dwc file created.

Tasks:

Tip: Also note that can test varnish cache clear can be done with drush cc (varnish option) - use same code drush uses for varnish.

therobyouknow commented 1 year ago

Solution to ensure cached previous copy of downloaded zip not used, always newly created zip is downloaded

Solution provided as per commit shown above this comment

However, will need to test in production to really know if this would work. In dev environment, regression tests performed to check site functions.

Test steps done on dev: 1 - Download the gbif dwca zip 2 - Downlaod the gbif dwca zip again 3 - Delete a few pieces of content that the gbif dwca zip would contain, e.g. occurences, biblio references. 4 - Download the gbif dwca zip again 5 - Compare the gbif zip files from seps 1 and step 4 and observe difference due to deletion of content, this proves the old file not cached 6 - For checking step 5 really works, compare gbif dwca zip from step 1 and step 2. They should be the same, assuming no other user editing content.

therobyouknow commented 1 year ago

testing arrangements

In addition to outline of steps above. Also started seeing if I can test my code on a git branch on staging.myspecies.info, using our Aegir CD/CI deployment tools to put the branch on this site. But staging.myspecies.info site not available at the moment:

image

Others seeing issue, but it seems back now. related: https://github.com/NaturalHistoryMuseum/scratchpads2/issues/6606

Archilegt commented 1 year ago

Part of testing: Verify that the "Rebuild frequency" setting triggers the needed actions, including clearing the varnish cache when the DwC file is created.

image

therobyouknow commented 1 year ago

Fix to ensure dwc-a export is not cached is now released (in release tag 2.11.1) on your site @Archilegt