Closed sadeghim closed 4 years ago
The final list of fields that can have DwC properties is:
(
"acceptedNameUsage",
"acceptedNameUsageID",
"accessRights",
"associatedMedia",
"associatedOccurrences",
"associatedReferences",
"associatedSequences",
"associatedTaxa",
"basisOfRecord",
"behavior",
"bibliographicCitation",
"catalogNumber",
"class",
"classs",
"collectionCode",
"collectionID",
"continent",
"coordinatePrecision",
"coordinateUncertaintyInMeters",
"country",
"countryCode",
"county",
"dataGeneralizations",
"dataResourceUid", // this is read from DB but will be ignored during export
"datasetID",
"datasetName",
"dateIdentified",
"day",
"decimalLatitude",
"decimalLongitude",
"disposition",
"dynamicProperties",
"endDayOfYear",
"establishmentMeans",
"eventAttributes",
"eventDate",
"eventID",
"eventRemarks",
"eventTime",
"family",
"fieldNotes",
"fieldNumber",
"footprintSpatialFit",
"footprintSRS",
"footprintWKT",
"genus",
"geodeticDatum",
"georeferencedBy",
"georeferencedDate",
"georeferenceProtocol",
"georeferenceRemarks",
"georeferenceSources",
"georeferenceVerificationStatus",
"habitat",
"higherClassification",
"higherGeography",
"higherGeographyID",
"identificationID",
"identificationQualifier",
"identificationReferences",
"identificationRemarks",
"identificationVerificationStatus",
"identifiedBy",
"individualCount",
"individualID",
"informationWithheld",
"infraspecificEpithet",
"institutionCode",
"institutionID",
"island",
"islandGroup",
"kingdom",
"language",
"license",
"lifeStage",
"locality",
"locationAccordingTo",
"locationAttributes",
"locationID",
"locationRemarks",
"maximumDepthInMeters",
"maximumDistanceAboveSurfaceInMeters",
"maximumElevationInMeters",
"measurementAccuracy",
"measurementDeterminedBy",
"measurementDeterminedDate",
"measurementID",
"measurementMethod",
"measurementRemarks",
"measurementType",
"measurementUnit",
"measurementValue",
"minimumDepthInMeters",
"minimumDistanceAboveSurfaceInMeters",
"minimumElevationInMeters",
"modified",
"month",
"municipality",
"nameAccordingTo",
"nameAccordingToID",
"namePublishedIn",
"namePublishedInID",
"namePublishedInYear",
"nomenclaturalCode",
"nomenclaturalStatus",
"occurrenceAttributes",
"occurrenceDetails",
"occurrenceID",
"occurrenceRemarks",
"occurrenceStatus",
"order",
"organismQuantity",
"organismQuantityType",
"originalNameUsage",
"originalNameUsageID",
"otherCatalogNumbers",
"ownerInstitutionCode",
"parentNameUsage",
"parentNameUsageID",
"phylum",
"pointRadiusSpatialFit",
"preparations",
"previousIdentifications",
"recordedBy",
"recordNumber",
"relatedResourceID",
"relationshipAccordingTo",
"relationshipEstablishedDate",
"relationshipOfResource",
"relationshipRemarks",
"reproductiveCondition",
"resourceID",
"resourceRelationshipID",
"rightsHolder",
"samplingEffort",
"samplingProtocol",
"scientificName",
"scientificNameAuthorship",
"scientificNameID",
"sex",
"specificEpithet",
"startDayOfYear",
"stateProvince",
"subgenus",
"taxonConceptID",
"taxonID",
"taxonomicStatus",
"taxonRank",
"taxonRemarks",
"type",
"typeStatus",
"verbatimCoordinates",
"verbatimCoordinateSystem",
"verbatimDepth",
"verbatimElevation",
"verbatimEventDate",
"verbatimLatitude",
"verbatimLocality",
"verbatimLongitude",
"verbatimSRS",
"verbatimTaxonRank",
"vernacularName",
"waterbody",
"year"
)
Apologies that this is a bit late, but i remembered there was a "Complete Scan" job in jenkins that retrieves a list of fields populated in Cassandra. This might be useful for cross checking outputs. I ran this yesterday to see if it works and it completes in 1hr 16mins. Heres a chart and the output from the jenkins job (the number is the number of records populated with the field).
Field | Number of records with populated value |
---|---|
abcdIdentificationQualifier | 3 |
abcdIdentificationQualifierInsertionPoint | 3 |
abcdTypeStatus | 3674 |
acceptedNameUsage | 1743692 |
accessRights | 16227 |
associatedMedia | 1916338 |
associatedOccurrences | 118197 |
associatedReferences | 1839047 |
associatedSequences | 429359 |
associatedTaxa | 254071 |
australianHerbariumRegion | 34 |
basisOfRecord | 52360815 |
behavior | 144816 |
bibliographicCitation | 1276628 |
catalogNumber | 62876582 |
class | 27221929 |
collectionCode | 28551398 |
collectionID | 1356223 |
continent | 993388 |
coordinatePrecision | 19436948 |
coordinateUncertaintyInMeters | 49332101 |
country | 48443448 |
countryCode | 27742780 |
countryConservation | 6232467 |
county | 12077636 |
cultivarName | 1797 |
cultivated | 31 |
dataGeneralizations | 11067368 |
datasetID | 15817765 |
datasetName | 15758930 |
dateIdentified | 7725969 |
day | 2490064 |
decimalLatitude | 82317511 |
decimalLatitudelatitude | 1429 |
decimalLongitude | 82317884 |
defaultValuesUsed | 87197596 |
disposition | 67880 |
distanceOutsideExpertRange | 16850 |
duplicates | 13 |
duplicatesOriginalInstitutionID | 7 |
duplicatesOriginalUnitID | 7 |
dynamicProperties | 80346 |
easting | 10948064 |
endDayOfYear | 204399 |
establishmentMeans | 18538077 |
eventDate | 82978035 |
eventID | 57328233 |
eventRemarks | 13708159 |
eventTime | 18632342 |
family | 31405029 |
fieldNotes | 243342 |
fieldNumber | 729374 |
firstLoaded | 60611903 |
footprintSRS | 1121938 |
footprintWKT | 3821973 |
generalisationToApplyInMetres | 827043 |
generalisedLocality | 89 |
genus | 28613834 |
geodeticDatum | 50622422 |
georeferencedBy | 3323738 |
georeferencedDate | 167582 |
georeferenceProtocol | 19559121 |
georeferenceRemarks | 60118 |
georeferenceSources | 2137690 |
georeferenceVerificationStatus | 24267319 |
habitat | 6148397 |
higherClassification | 1098960 |
higherGeography | 1544549 |
identificationID | 2609726 |
identificationQualifier | 607372 |
identificationReferences | 98844 |
identificationRemarks | 9503405 |
identificationVerificationStatus | 24100582 |
identifiedBy | 8310807 |
identifierBy | 14370 |
identifierRole | 270986 |
individualCount | 30147972 |
individualID | 35582 |
informationWithheld | 11215747 |
infraspecificEpithet | 1979723 |
institutionCode | 31007560 |
institutionID | 2143416 |
institutionName | 9944 |
island | 332591 |
islandGroup | 299980 |
kingdom | 24957731 |
language | 2720836 |
license | 1906419 |
lifeStage | 1653304 |
loanDate | 1 |
loanDestination | 2 |
loanForBotanist | 1 |
loanIdentifier | 2 |
loanSequenceNumber | 1 |
locality | 71102228 |
locationAccordingTo | 372941 |
locationDetermined | 87271244 |
locationID | 43136286 |
locationRemarks | 10818221 |
maximumDepthInMeters | 3764897 |
maximumElevationInMeters | 5543904 |
measurementAccuracy | 85506 |
measurementDeterminedBy | 104227 |
measurementDeterminedDate | 223961 |
measurementID | 223961 |
measurementMethod | 223961 |
measurementRemarks | 223961 |
measurementType | 129274 |
measurementUnit | 304638 |
measurementValue | 219962 |
minimumDepthInMeters | 4053540 |
minimumElevationInMeters | 6310101 |
miscProperties | 61677257 |
modified | 16210736 |
month | 9581921 |
municipality | 258663 |
nameAccordingTo | 4931254 |
namePublishedIn | 725742 |
naturalOccurrence | 52 |
nearNamedPlaceRelationTo | 17 |
nomenclaturalCode | 12619127 |
nomenclaturalStatus | 41896 |
northing | 10948060 |
occurrenceDetails | 1126948 |
occurrenceID | 29973652 |
occurrenceRemarks | 19417393 |
occurrenceStatus | 45918466 |
order | 23940058 |
organismQuantity | 1982358 |
organismQuantityType | 2914176 |
originalNameUsage | 145552 |
originalSensitiveValues | 828548 |
otherCatalogNumbers | 4463842 |
ownerInstitutionCode | 14310451 |
parentNameUsage | 218264 |
phenology | 32 |
photographer | 33137 |
photoPageUrl | 6789 |
phylum | 13642866 |
preparations | 7009343 |
previousIdentifications | 1408333 |
provenance | 1 |
recordedBy | 59172070 |
recordNumber | 10125718 |
relatedResourceID | 684 |
relationshipOfResource | 141195 |
relationshipRemarks | 4353 |
reproductiveCondition | 4795186 |
rights | 2407238 |
rightsholder | 2527906 |
samplingEffort | 16457311 |
samplingProtocol | 35403628 |
scientificName | 85763847 |
scientificNameAuthorship | 21362672 |
scientificNameID | 14913235 |
scientificNameWithoutAuthor | 54 |
secondaryCollectors | 177695 |
sex | 6300144 |
source | 223961 |
species | 1223809 |
specificEpithet | 24199077 |
startDayOfYear | 213337 |
stateProvince | 50756535 |
subfamily | 134102 |
subgenus | 164319 |
subspecies | 2254 |
superfamily | 124027 |
taxonConceptID | 250330 |
taxonID | 56597 |
taxonomicStatus | 47583 |
taxonRank | 25861795 |
taxonRemarks | 3916684 |
type | 2 |
typeStatus | 405360 |
typifiedName | 3700 |
userAssertionStatus | 11968008 |
userId | 1163369 |
verbatimCoordinates | 7499432 |
verbatimCoordinateSystem | 18283385 |
verbatimDateIdentified | 239425 |
verbatimDepth | 28283 |
verbatimElevation | 3948103 |
verbatimEventDate | 17049995 |
verbatimLatitude | 19911119 |
verbatimLocality | 17895022 |
verbatimLongitude | 19911101 |
verbatimSRS | 14617062 |
verbatimTaxonRank | 516822 |
verificationDate | 1 |
verifier | 1 |
vernacularName | 66248766 |
waterBody | 754324 |
year | 9752558 |
zone | 11044802 |
This is handy to compare with what @nickdos ran. They are a superset of Darwin Core terms with some oddities such as "decimalLatitudelatitude".
Here is the spreadsheet to get the field list common between our Cassandra schema and TDWG/DwC standard: https://docs.google.com/spreadsheets/d/1DkYFLzt9377fXFBROvZbb4JK5rPinvYhcfisMU4K3jE/edit?usp=sharing
@M-Nicholls could you please have a look at the spreadsheet and let me know if there is an issue with it? It shows all Cassandra columns, then raw columns (excluding processed and qa) and then the matched DwC term for them.
From sprint 6 planning session: fields are documented and we can refer to them at a later point if necessary. This can be marked done.
occ.occ
table from Cassandra