gbif / ipt

GBIF Integrated Publishing Toolkit (IPT)
https://www.gbif.org/ipt
Apache License 2.0
127 stars 57 forks source link

Create Data Package metadata editor #1829

Closed peterdesmet closed 1 year ago

peterdesmet commented 2 years ago

This issue relates to using IPT for publishing Camtrap DP and is not applicable to production IPTs.

Create a metadata editor cf. XML metadata editor for Data Package metadata in general and Camtrap DP metadata in specific. https://tdwg.github.io/camtrap-dp/metadata/ is a good place to start to see what metadata properties should be included for Camtrap DP. It also indicates which terms are borrowed from the general Data Package specs and which ones are specific to Camtrap DP.

The following features should be included:

We'll have to experiment a bit regarding how to separate the metadata in meaningful sections. Here's an attempt:

Basic metadata (all data package attributes)

Geographic scope

See https://tdwg.github.io/camtrap-dp/metadata/#spatial

Taxonomic scope

See https://tdwg.github.io/camtrap-dp/metadata/#taxonomic

Temporal scope

See https://tdwg.github.io/camtrap-dp/metadata/#temporal

Project

See https://tdwg.github.io/camtrap-dp/metadata/#project

Resembles EML project, but has specific properties for Camtrap DP

Other metadata

peterdesmet commented 2 years ago

@mike-podolskiy90 note that https://tdwg.github.io/camtrap-dp/metadata/ now provides a much better overview of all metadata in Frictionless Packages + Camtrap DP. The order there is also one I find logical (borrowed from Zenodo, etc.)

fmendezh commented 1 year ago

@mike-podolskiy90 To register camtrap dp datasets, it seems we can use the same IPT endpoint registry/ipt/resource, the EML endpoint can be ignored and if needed we can generate the EML later in the process using the camtraptor R library, the only difference is that the IPT must use this CamtrapDP endpoint type for the archive endpoint type

peterdesmet commented 1 year ago

@mike-podolskiy90 thanks for all the features in the metadata editor for Camtrap DP! Could you change the following:

Hidden properties

Basic metadata

Geographic scope

Taxonomic scope

Temporal scope

Project

Other metadata

mike-podolskiy90 commented 1 year ago

Thank you Peter, I'll be looking into it shortly

mike-podolskiy90 commented 1 year ago

@peterdesmet What do you mean values not saved please?

mike-podolskiy90 commented 1 year ago

@peterdesmet There were issues with vernacularName fields, I'm working on it. @MattBlissett Matt told me that there are languages that do not have a two letters code. Should we consider changing the validation pattern ^[a-z]{2}$ ?

peterdesmet commented 1 year ago

Thanks, didn’t know that. I’ll log the suggestion for 3 letter codes in Camtrap DP.

MattBlissett commented 1 year ago

As an example, this dataset: https://www.gbif.org/dataset/ded724e7-3fde-49c5-bfa3-03b4045c4c5f has names in Lango and Achioli dialects of Southern Luo of Uganda, each dialect and the language has a 3-letter ISO 639-3 code, but none have two-letter codes.

(There's also an outstanding issue in Checklistbank to handle these codes.)

mike-podolskiy90 commented 1 year ago

@peterdesmet I've deployed the most recent version, please give it a try

peterdesmet commented 1 year ago

Hi @mike-podolskiy90 I tested the new version. Great to see so many things are solved! I did have to revert a couple of checkboxes in https://github.com/gbif/ipt/issues/1829#issuecomment-1446033565 as they are not fixed yet. I will repeat them here (+ add some new ones I noticed):

Issues that can be fixed

Issues that might require discussion

mike-podolskiy90 commented 1 year ago

@peterdesmet Thanks for the comments. I'll be looking into it shortly. Regarding id - it's just a random UUID generated by the frictionless datapackage generator library

mike-podolskiy90 commented 1 year ago

@peterdesmet I'm concerned about references to raw.githubusercontent.com Currently we don't store that value anywhere, and GBIF schemas refer to rs.gbif.org instead. So I guess we have to align that somehow

peterdesmet commented 1 year ago

I understand. If you want to replace the values in schema, than they should meet the following requirements:

Note that profile also has a raw.githubusercontent.com URL. If you want to refer to rs.gbif.org, then the camtrap-dp-profile.json should be hosted there as well, meet the requirements above and not be available for the users as a table schema (since it isn't a table schema).

mike-podolskiy90 commented 1 year ago

Thanks Peter So for table schemas would be like this (currently not reasolvable, actual schema in sandbox): https://rs.gbif.org/camtrap-dp/0.6/deployments.json

And we also have to place a profile somewhere https://raw.githubusercontent.com/tdwg/camtrap-dp/0.6/camtrap-dp-profile.json

peterdesmet commented 1 year ago

Correct, that would work. And I would place the profile at https://rs.gbif.org/camtrap-dp/0.6/camtrap-dp-profile.json or https://rs.gbif.org/camtrap-dp/0.6/profile.json

mike-podolskiy90 commented 1 year ago

@peterdesmet I'm struggling to create proper classes for Geojson and produce a valid output. spatial refers to https://github.com/tdwg/camtrap-dp/blob/main/camtrap-dp-profile.json#L253, and the JSON schema there does not seem to be up-to-date.

mike-podolskiy90 commented 1 year ago

I've compared with https://www.rfc-editor.org/rfc/rfc7946

I've also played with validator tool a bit, and this json looks like valid:

{
    "type": "Polygon",
    "coordinates": [
        [
            [
                100,
                0
            ],
            [
                101,
                0
            ],
            [
                101,
                1
            ],
            [
                100,
                1
            ],
            [
                100,
                0
            ]
        ]
    ]
}
peterdesmet commented 1 year ago

Hmm, I don't recall why I referred to http://json.schemastore.org/geojson.json specifically, but it does use "$schema": "http://json-schema.org/draft-04/schema#" which is the same version used by Frictionless and camtrap-dp-profile. So I would prefer to keep it that way, unless it's ok to mix versions. I'm not very experienced with JSON schemas.

The example package that comes with Camtrap DP has a valid spatial object: https://github.com/tdwg/camtrap-dp/blob/aace2ee526c2b5e6b55325dea6173406762a96f5/example/datapackage.json#L140-L176

peterdesmet commented 1 year ago

Should we use https://geojson.org/schema/GeoJSON.json (it relies on http://json-schema.org/draft-07/schema#) which is hosted from https://github.com/geojson/schema

mike-podolskiy90 commented 1 year ago

Thanks for quick reply. Let me have a look

mike-podolskiy90 commented 1 year ago

I've generated an archive with the following spatial data:

  "spatial" : {
    "type" : "Polygon",
    "coordinates" : [ [ [ 1.0, 2.0 ], [ 3.0, 2.0 ], [ 3.0, 4.0 ], [ 1.0, 4.0 ], [ 1.0, 2.0 ] ] ],
    "bbox" : [ 1.0, 2.0, 3.0, 4.0 ]
  }

looks like it's a valid geojson

I've just replaced crs field with coordinates in the Geojson java class. I don't know if we have to change the schema reference though https://github.com/gbif/ipt/blob/master-3.0/src/main/java/org/gbif/ipt/model/datapackage/metadata/camtrap/Geojson.java#L58

peterdesmet commented 1 year ago

Great! That looks simpler. I have submitted a PR to Camtrap DP to change the example to a polygon (like you use) rather than feature: https://github.com/tdwg/camtrap-dp/pull/312

mike-podolskiy90 commented 1 year ago

@peterdesmet I think I applied all changes but discussion ones. Could you give it a try please?

peterdesmet commented 1 year ago

@mike-podolskiy90 I have tested an noticed some more issues (below). I have closed the 2 discussion items mentioned above and created separate issues for those.

Creating a resource

Metadata editor

Publishing a resource

Screenshot 2023-04-21 at 11 35 36 Screenshot 2023-04-25 at 18 09 11 Screenshot 2023-04-25 at 18 11 20

Published data

  "licenses" : [ {
    "name" : "CC0-1.0",
    "scope" : "data"
  }, {
    "name" : "CC-BY-4.0",
    "scope" : "media"
  } ],

Which differs from the typical json pretty:

  "licenses": [
    {
      "name": "CC0-1.0",
      "scope": "data"
    },
    {
      "name": "CC-BY-4.0",
      "scope": "media"
    }
  ],
mike-podolskiy90 commented 1 year ago

Thank you very much for thorough testing Peter. I've updated the frictionless data package java library recently and it seems it cause quite some issues in the published data.

mike-podolskiy90 commented 1 year ago

I've fixed those. I haven't managed to reproduce freemarker issue though, preview works fine.

Regarding not selecting type - this is something to think about. I would suggest to force users to select main type when they create a resource - is DwC, datapackage/frictionless or other, and then select "subtype"

peterdesmet commented 1 year ago

Yes, making type required when creating a resource sounds good to me. Subtype could then perhaps be selected in a later step (especially Event/Occurrence is sometimes only decided later on).

peterdesmet commented 1 year ago

Freemarker error when previewing an unpublished resource:

``` FreeMarker template error (HTML_DEBUG mode; use RETHROW in production!) The following has evaluated to null or missing: ==> dpMetadata.created [in template "WEB-INF/pages/portal/resource_dp.ftl" at line 132, column 146] ---- Tip: It's the step after the last dot that caused this error, not those before it. ---- Tip: If the failing expression is known to legally refer to something that's sometimes null or missing, either specify a default value like myOptionalVar!myDefault, or use <#if myOptionalVar??>when-present<#else>when-missing. (These only cover the last step of the expression; to cover the whole expression, use parenthesis: (myOptionalVar.foo)!myDefault, (myOptionalVar.foo)?? ---- ---- FTL stack trace ("~" means nesting-related): - Failed at: ${dpMetadata.created?date?string.long... [in template "WEB-INF/pages/portal/resource_dp.ftl" at line 132, column 144] - Reached through: #include "/WEB-INF/pages/portal/resou... [in template "WEB-INF/pages/portal/resource.ftl" at line 5, column 9] ---- Java stack trace (for programmers): ---- freemarker.core.InvalidReferenceException: [... Exception message was already printed; see it above ...] at freemarker.core.InvalidReferenceException.getInstance(InvalidReferenceException.java:134) at freemarker.core.EvalUtil.coerceModelToTextualCommon(EvalUtil.java:481) at freemarker.core.EvalUtil.coerceModelToPlainText(EvalUtil.java:455) at freemarker.core.Expression.evalAndCoerceToPlainText(Expression.java:117) at freemarker.core.BuiltInsForMultipleTypes$dateBI._eval(BuiltInsForMultipleTypes.java:253) at freemarker.core.Expression.eval(Expression.java:101) at freemarker.core.BuiltInsForMultipleTypes$stringBI._eval(BuiltInsForMultipleTypes.java:765) at freemarker.core.Expression.eval(Expression.java:101) at freemarker.core.Dot._eval(Dot.java:41) at freemarker.core.Expression.eval(Expression.java:101) at freemarker.core.BuiltInForLegacyEscaping._eval(BuiltInForLegacyEscaping.java:33) at freemarker.core.Expression.eval(Expression.java:101) at freemarker.core.DollarVariable.calculateInterpolatedStringOrMarkup(DollarVariable.java:100) at freemarker.core.DollarVariable.accept(DollarVariable.java:63) at freemarker.core.Environment.visit(Environment.java:347) at freemarker.core.Environment.visit(Environment.java:353) at freemarker.core.Environment.visit(Environment.java:353) at freemarker.core.Environment.visit(Environment.java:353) at freemarker.core.Environment.visit(Environment.java:353) at freemarker.core.Environment.include(Environment.java:2955) at freemarker.core.Include.accept(Include.java:171) at freemarker.core.Environment.visit(Environment.java:347) at freemarker.core.Environment.visit(Environment.java:353) at freemarker.core.Environment.visit(Environment.java:353) at freemarker.core.Environment.process(Environment.java:326) at freemarker.template.Template.process(Template.java:383) at org.apache.struts2.views.freemarker.FreemarkerResult.doExecute(FreemarkerResult.java:184) at org.apache.struts2.result.StrutsResultSupport.execute(StrutsResultSupport.java:206) at com.opensymphony.xwork2.DefaultActionInvocation.executeResult(DefaultActionInvocation.java:375) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:279) at org.gbif.ipt.struts2.RequireManagerInterceptor.intercept(RequireManagerInterceptor.java:138) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at com.opensymphony.xwork2.interceptor.DefaultWorkflowInterceptor.doIntercept(DefaultWorkflowInterceptor.java:179) at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:99) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.gbif.ipt.struts2.CsrfLoginInterceptor.intercept(CsrfLoginInterceptor.java:91) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at com.opensymphony.xwork2.validator.ValidationInterceptor.doIntercept(ValidationInterceptor.java:263) at org.apache.struts2.interceptor.validation.AnnotationValidationInterceptor.doIntercept(AnnotationValidationInterceptor.java:49) at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:99) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at com.opensymphony.xwork2.interceptor.ConversionErrorInterceptor.doIntercept(ConversionErrorInterceptor.java:142) at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:99) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at com.opensymphony.xwork2.interceptor.ParametersInterceptor.doIntercept(ParametersInterceptor.java:140) at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:99) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at com.opensymphony.xwork2.interceptor.ParametersInterceptor.doIntercept(ParametersInterceptor.java:140) at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:99) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.apache.struts2.interceptor.MultiselectInterceptor.intercept(MultiselectInterceptor.java:67) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.apache.struts2.interceptor.DateTextFieldInterceptor.intercept(DateTextFieldInterceptor.java:133) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.apache.struts2.interceptor.CheckboxInterceptor.intercept(CheckboxInterceptor.java:89) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at com.opensymphony.xwork2.interceptor.PrepareInterceptor.doIntercept(PrepareInterceptor.java:175) at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:99) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.apache.struts2.interceptor.ServletConfigInterceptor.intercept(ServletConfigInterceptor.java:167) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at com.opensymphony.xwork2.interceptor.ExceptionMappingInterceptor.intercept(ExceptionMappingInterceptor.java:196) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.apache.struts2.interceptor.I18nInterceptor.intercept(I18nInterceptor.java:121) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.gbif.ipt.struts2.RedirectMessageInterceptor.doIntercept(RedirectMessageInterceptor.java:134) at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:99) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.gbif.ipt.struts2.SetupAndCancelInterceptor.intercept(SetupAndCancelInterceptor.java:106) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.gbif.ipt.struts2.ResourceSessionInterceptor.intercept(ResourceSessionInterceptor.java:54) at com.google.inject.struts2.Struts2Factory$ProvidedInterceptor.intercept(Struts2Factory.java:236) at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249) at org.apache.struts2.factory.StrutsActionProxy.execute(StrutsActionProxy.java:48) at org.apache.struts2.dispatcher.Dispatcher.serviceAction(Dispatcher.java:574) at org.apache.struts2.dispatcher.ExecuteOperations.executeAction(ExecuteOperations.java:79) at org.apache.struts2.dispatcher.filter.StrutsPrepareAndExecuteFilter.doFilter(StrutsPrepareAndExecuteFilter.java:141) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at org.gbif.ipt.struts2.CorsFilter.doFilter(CorsFilter.java:37) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:133) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:540) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:687) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:359) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:399) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:889) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1735) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.base/java.lang.Thread.run(Thread.java:829) ```
peterdesmet commented 1 year ago

@mike-podolskiy90 I've now retested everything listed in https://github.com/gbif/ipt/issues/1829#issuecomment-1517583404 and checked off things that are not yet resolved + added some new. I think we're almost there. 😅

mike-podolskiy90 commented 1 year ago

Thanks for the comments

peterdesmet commented 1 year ago
mike-podolskiy90 commented 1 year ago

You can't properly assign a DOI if IPT is not configured to. For DwC resources you can only specify it as an alternative identifier

mike-podolskiy90 commented 1 year ago

Field order and formatting is a bit a problem. Files stored in the IPT formatted properly, but when we produce archive the datapackage library re-create datapackage descriptor file and I can't fully control that process right now.

peterdesmet commented 1 year ago

So the only remaining one (probably also added by the datapackage java library) is the valid: false flag, which might be confusing to people.

mike-podolskiy90 commented 1 year ago

valid should not be present anymore

peterdesmet commented 1 year ago

When creating a resource, can type be written as Camera Trap Data Package (Camtrap DP) (not lowercase trap)?

peterdesmet commented 1 year ago

The property valid is still included in the current version of IPT3

mike-podolskiy90 commented 1 year ago

Sorry, I haven't built new version of datapackage-java, it has to fix the issue.

I've also corrected the name of the package (it requires schema reinstallation though)

peterdesmet commented 1 year ago

I notice valid is no longer present in datapackage.json 👍

With that I think we can close this massive issue. 😅 Well done implementing all this!