Closed peterdesmet closed 1 year ago
@mike-podolskiy90 note that https://tdwg.github.io/camtrap-dp/metadata/ now provides a much better overview of all metadata in Frictionless Packages + Camtrap DP. The order there is also one I find logical (borrowed from Zenodo, etc.)
@mike-podolskiy90 To register camtrap dp datasets, it seems we can use the same IPT endpoint registry/ipt/resource, the EML endpoint can be ignored and if needed we can generate the EML later in the process using the camtraptor R library, the only difference is that the IPT must use this CamtrapDP endpoint type for the archive endpoint type
@mike-podolskiy90 thanks for all the features in the metadata editor for Camtrap DP! Could you change the following:
id
is assigned (e.g. with fb2fb39c-3a11-437c-8e60-aa50c5e5172b
). How is this value assigned?"created" : 1677491741000
is currently a number, it should be a ISO 8601 string"version" : 1.0
is currently a number, it should be a stringprofile
should now be https://raw.githubusercontent.com/tdwg/camtrap-dp/0.6/camtrap-dp-profile.json
it is currently camtrap-dp
contributor.role
is now an enum for Camtrap DP contact
, principalInvestigator
, rightsHolder
, publisher
, contributor
licenses
, the delete button should say Remove this license
(not source)license
either the name
or the path
is required (having both could conflict each other and the IPT currently doesn't raise an error if they are not populated). I would implement one (name
seems easiest). You could also just list both licenses (one for data
and one for media
), where the data
one has to be one of the GBIF supported licenses.license.scope
value is not savedWest/East/South/North
fields that are translated to a geojson property, I would remove the Type
field and automatically set it to Feature
taxonRank
: sort enum values as provided (hierarchical)vernacularNames
was not added yet. Is this still a todo or a deliberate choice?kingdom
, phylum
, class
, order
, family
, genus
. Would only add these if vernacularName
is added too.description
has been reworded slightly:
Description of the project. Preferably formatted as Markdown. Not to be confused with the description of the package (
package.description
)."
samplingDesign
has been reworded slightly: the link is now under the citation.samplingDesign
values are now in lowerCamelCase: simpleRandom
, systematicRandom
, clusteredRandom
, experimental
, targeted
, opportunistic
captureMethod
values are now in lowerCamelCase: motionDetection
, timeLapse
classificationLevel
has been deprecated and can be removed from the editor.individualAnimals
: new required field, right after captureMethod
. The expect value is a boolean.sequenceInterval
has been renamed to eventInterval
with new definition. The expected values is an integer.relationType
: sort enum values as providedresourceTypeGeneral
: name as Resource type general
and sort enum values as provided (alphabetical)relatedIdentifierType
: sort enum values as provided (alphabetical)references
. Below relatedIdentifiers
. Is an array of strings (cf. keywords).Thank you Peter, I'll be looking into it shortly
@peterdesmet What do you mean values not saved please?
@peterdesmet There were issues with vernacularName
fields, I'm working on it.
@MattBlissett Matt told me that there are languages that do not have a two letters code. Should we consider changing the validation pattern ^[a-z]{2}$
?
Thanks, didn’t know that. I’ll log the suggestion for 3 letter codes in Camtrap DP.
As an example, this dataset: https://www.gbif.org/dataset/ded724e7-3fde-49c5-bfa3-03b4045c4c5f has names in Lango and Achioli dialects of Southern Luo of Uganda, each dialect and the language has a 3-letter ISO 639-3 code, but none have two-letter codes.
(There's also an outstanding issue in Checklistbank to handle these codes.)
@peterdesmet I've deployed the most recent version, please give it a try
Hi @mike-podolskiy90 I tested the new version. Great to see so many things are solved! I did have to revert a couple of checkboxes in https://github.com/gbif/ipt/issues/1829#issuecomment-1446033565 as they are not fixed yet. I will repeat them here (+ add some new ones I noticed):
profile
should be https://raw.githubusercontent.com/tdwg/camtrap-dp/0.6/camtrap-dp-profile.json
it is currently camtrap-dp
See https://github.com/gbif/ipt/issues/1955#issuecomment-1446448527. Note: it might be good to inject the version as a variable in that URL, so it is automatically updated if the schemas get updated at rs.gbif.orgschema
of resources currently point to schema/http_rs_gbif_org_schemas_camtrap-dp_deployments.json
, that should be e.g. https://raw.githubusercontent.com/tdwg/camtrap-dp/0.6/deployments-table-schema.json
See https://github.com/gbif/ipt/issues/1955#issuecomment-1446448527. Note: see above on using a variable for the version.datapackage.json
under a schemaVerbose
property. See https://github.com/gbif/ipt/issues/1955#issuecomment-1446448527"created" : 1677491741000
is currently a number, it should be a ISO 8601 stringrole
: sort enum values as provided"spatial" : {
"type" : "Polygon",
"bbox" : [ 4.013, 5.659, 50.699, 51.496 ]
},
Language code
and Vernacular name
description
: remove [ ]
around Markdown from help text:
samplingDesign
: sort enum values as providedrelationType
: sort enum values as providedresourceTypeGeneral
: label as Resource type general
and sort enum values as provided (alphabetical)relatedIdentifierType
: sort enum values as provided (alphabetical)id
is assigned (e.g. with fb2fb39c-3a11-437c-8e60-aa50c5e5172b
). How is this value assigned? see #1999image
property as a resource logo? I'm not a big fan of resource logos, as the chosen image is often refers to the publisher rather than project or dataset. Ping @timrobertson100 see #2000@peterdesmet Thanks for the comments. I'll be looking into it shortly.
Regarding id
- it's just a random UUID generated by the frictionless datapackage generator library
@peterdesmet I'm concerned about references to raw.githubusercontent.com
Currently we don't store that value anywhere, and GBIF schemas refer to rs.gbif.org
instead. So I guess we have to align that somehow
I understand. If you want to replace the values in schema
, than they should meet the following requirements:
schema/http_rs_gbif_org_schemas_camtrap-dp_deployments.json
is not sufficient)0.6
) in their URLNote that profile
also has a raw.githubusercontent.com
URL. If you want to refer to rs.gbif.org
, then the camtrap-dp-profile.json
should be hosted there as well, meet the requirements above and not be available for the users as a table schema (since it isn't a table schema).
Thanks Peter
So for table schemas would be like this (currently not reasolvable, actual schema in sandbox):
https://rs.gbif.org/camtrap-dp/0.6/deployments.json
And we also have to place a profile somewhere https://raw.githubusercontent.com/tdwg/camtrap-dp/0.6/camtrap-dp-profile.json
Correct, that would work. And I would place the profile at https://rs.gbif.org/camtrap-dp/0.6/camtrap-dp-profile.json
or https://rs.gbif.org/camtrap-dp/0.6/profile.json
@peterdesmet I'm struggling to create proper classes for Geojson and produce a valid output.
spatial
refers to https://github.com/tdwg/camtrap-dp/blob/main/camtrap-dp-profile.json#L253, and the JSON schema there does not seem to be up-to-date.
I've compared with https://www.rfc-editor.org/rfc/rfc7946
I've also played with validator tool a bit, and this json looks like valid:
{
"type": "Polygon",
"coordinates": [
[
[
100,
0
],
[
101,
0
],
[
101,
1
],
[
100,
1
],
[
100,
0
]
]
]
}
Hmm, I don't recall why I referred to http://json.schemastore.org/geojson.json
specifically, but it does use "$schema": "http://json-schema.org/draft-04/schema#"
which is the same version used by Frictionless and camtrap-dp-profile
. So I would prefer to keep it that way, unless it's ok to mix versions. I'm not very experienced with JSON schemas.
The example package that comes with Camtrap DP has a valid spatial
object: https://github.com/tdwg/camtrap-dp/blob/aace2ee526c2b5e6b55325dea6173406762a96f5/example/datapackage.json#L140-L176
Should we use https://geojson.org/schema/GeoJSON.json (it relies on http://json-schema.org/draft-07/schema#
) which is hosted from https://github.com/geojson/schema
Thanks for quick reply. Let me have a look
I've generated an archive with the following spatial
data:
"spatial" : {
"type" : "Polygon",
"coordinates" : [ [ [ 1.0, 2.0 ], [ 3.0, 2.0 ], [ 3.0, 4.0 ], [ 1.0, 4.0 ], [ 1.0, 2.0 ] ] ],
"bbox" : [ 1.0, 2.0, 3.0, 4.0 ]
}
looks like it's a valid geojson
I've just replaced crs
field with coordinates
in the Geojson
java class. I don't know if we have to change the schema reference though
https://github.com/gbif/ipt/blob/master-3.0/src/main/java/org/gbif/ipt/model/datapackage/metadata/camtrap/Geojson.java#L58
Great! That looks simpler. I have submitted a PR to Camtrap DP to change the example to a polygon (like you use) rather than feature: https://github.com/tdwg/camtrap-dp/pull/312
@peterdesmet I think I applied all changes but discussion ones. Could you give it a try please?
@mike-podolskiy90 I have tested an noticed some more issues (below). I have closed the 2 discussion items mentioned above and created separate issues for those.
CamtrapDP
. This should be Camtrap DP
[x] The help text for role
is probably best taken from Camtrap DP, not Frictionless because the allowed values are different. Set to
Role of the contributor. Defaults to
contributor
.
[x] I'm getting Invalid field value
when using https://example.com
in project.path
. This field is expected to be a URL.
No license set
Coordinate Precision
:[x] Properties are missing in the published datapackage.json
, even when populated in metadata:
id
(not included)name
= resource name (I believe this was included in previous versions)created
contributors
version
keywords
homepage
(untested)sources
licenses
[x] profile
has the value data-package
, rather than the expected https://rs.gbif.org/sandbox/experimental/camtrap-dp/0.6/profile/camtrap-dp-profile.json
(see #1955)
[x] The geojson object in spatial
has switched values in bbox
and coordinates
. It should be:
{
"type": "Polygon",
"bbox": [
west,
south,
east,
north
],
"coordinates": [ <- walking counterclockwise around the bounding box
[
[west, south],
[east, south],
[east, north],
[west, north],
[west, south]
]
]
}
[ ] What is the "valid" : false
property?
[ ] Is it possible to preserve a certain order of attributes in the generated json or will this differ from file to file?
[ ] What styling are you using for the json output, I notice:
"licenses" : [ {
"name" : "CC0-1.0",
"scope" : "data"
}, {
"name" : "CC-BY-4.0",
"scope" : "media"
} ],
Which differs from the typical json pretty:
"licenses": [
{
"name": "CC0-1.0",
"scope": "data"
},
{
"name": "CC-BY-4.0",
"scope": "media"
}
],
Thank you very much for thorough testing Peter. I've updated the frictionless data package java library recently and it seems it cause quite some issues in the published data.
I've fixed those. I haven't managed to reproduce freemarker issue though, preview works fine.
Regarding not selecting type - this is something to think about. I would suggest to force users to select main type when they create a resource - is DwC, datapackage/frictionless or other, and then select "subtype"
Yes, making type
required when creating a resource sounds good to me. Subtype could then perhaps be selected in a later step (especially Event/Occurrence is sometimes only decided later on).
Freemarker error when previewing an unpublished resource:
@mike-podolskiy90 I've now retested everything listed in https://github.com/gbif/ipt/issues/1829#issuecomment-1517583404 and checked off things that are not yet resolved + added some new. I think we're almost there. 😅
Thanks for the comments
valid
is something new which I'm not aware of, I think we should get rid of that.
Looks like id
disappeared? I guess it isn't a problem?
Field order right now is arbitrary, but of course we can choose the order and preserve it if we want.
JSON formatting is a standard pretty print of the jackson library. We can probably tweak it, I'll check
valid
: Maybe it's property set by the java frictionless library? But best to remove if possibleid
: the best one to set here is the resource DOI. How are DOIs assigned to DwC-A (in case the IPT is not configured to issue DOIs)?field
order: preferred order would be as the are listed on this pagekey<space>:<space>value
was a standard pretty print.You can't properly assign a DOI if IPT is not configured to. For DwC resources you can only specify it as an alternative identifier
Field order and formatting is a bit a problem. Files stored in the IPT formatted properly, but when we produce archive the datapackage library re-create datapackage descriptor file and I can't fully control that process right now.
id
fieldSo the only remaining one (probably also added by the datapackage java library) is the valid: false
flag, which might be confusing to people.
valid
should not be present anymore
When creating a resource, can type be written as Camera Trap Data Package (Camtrap DP)
(not lowercase trap
)?
The property valid
is still included in the current version of IPT3
Sorry, I haven't built new version of datapackage-java, it has to fix the issue.
I've also corrected the name of the package (it requires schema reinstallation though)
I notice valid
is no longer present in datapackage.json
👍
With that I think we can close this massive issue. 😅 Well done implementing all this!
This issue relates to using IPT for publishing Camtrap DP and is not applicable to production IPTs.
Create a metadata editor cf. XML metadata editor for Data Package metadata in general and Camtrap DP metadata in specific. https://tdwg.github.io/camtrap-dp/metadata/ is a good place to start to see what metadata properties should be included for Camtrap DP. It also indicates which terms are borrowed from the general Data Package specs and which ones are specific to Camtrap DP.
The following features should be included:
datapackage.json
file (cf. uploading a EML.xml file)datapackage.json
. See here for exampleWe'll have to experiment a bit regarding how to separate the metadata in meaningful sections. Here's an attempt:
Basic metadata (all data package attributes)
resources
: set internally when publishingprofile
: set internally,https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.7/camtrap-dp-profile.json
for camera trap dataname
: set internally = resource nameid
: set internally when publishing, either DOI, GBIF dataset UUID, or othercreated
: set internally when publishing = publication datetitle
: provided by usercontributors
: list provided by user, could be moved to separate page if need be.title
: name of person/orgpath
: website for person/orgemail
: email of person/orgrole
:author
,publisher
,maintainer
,wrangler
, andcontributor
. Normal default iscontributor
, but it in the context of publishing data,author
as default might make more sense. Currently, only one role per person is supported.organization
: affiliationdescription
: provided by user, can contain multiple paragraphs (\n
), ideally supports Markdown or HTMLversion
: set internally when publishing = resource version (ideally all major)keywords
: list provided by user, there are no thesauriimage
: provided by user, could be equivalent of resource logo Dedicated issue: #2000homepage
: provided by usersources
: list provided by userlicenses
: provided by user via dropdown. For Camtrap DP there are 2 licenses: one for data, one for images (can be reassessed if too complex).Geographic scope
See https://tdwg.github.io/camtrap-dp/metadata/#spatial
spatial
: Currently a geojson object (e.g. a bounding box) representing the scope. Could potentially be crafted from the data. For now, I think a field where one can write the geojson is sufficient. Alternatively, a bounding box tool is provided (cf. EML editor)coordinatePrecision
: numeric value provided by user, see https://tdwg.github.io/camtrap-dp/metadata/#coordinateprecisionTaxonomic scope
See https://tdwg.github.io/camtrap-dp/metadata/#taxonomic
taxonID
taxonIDReference
: often the same across all taxa, so potentially allow option to set this for allscientificName
taxonRank
vernacularNames
: a list oflanguage: name
pairs, not entirely sure how to best represent that in a formTemporal scope
See https://tdwg.github.io/camtrap-dp/metadata/#temporal
start
: could be potentially be derived from dataend
: could be potentially be derived from dataProject
See https://tdwg.github.io/camtrap-dp/metadata/#project
Resembles EML project, but has specific properties for Camtrap DP
id
title
acronym
description
path
samplingDesign
: enumcaptureMethod
: enumanimalTypes
: enum, might change~classificationLevel
: enumsequenceInterval
: enumOther metadata
bibliographicCitation
: see https://tdwg.github.io/camtrap-dp/metadata/#bibliographiccitationreferences
: might change~relatedIdentifiers
: follows the DataCite format, see https://tdwg.github.io/camtrap-dp/metadata/#relatedidentifiers: deprecated, organizations are part of contributorsorganizations
: deprecated, rightsHolder is part of contributorsrightsHolder
: deprecated, platform is a sourceplatform