Open DavidRoy opened 3 months ago
Please can I clarify how I should work out the list of files to generate (i.e. the list of regions). On EBMS there are a list of schemes which are held in Drupal content. They each point to a location selected from the Countries 2016 list on Indicia. Countries in the European NUTS area can be used to find the child NUTS level 1 regions, then the child NUTS level 2 regions and it is these regions that are available for choosing as a region on a user's profile. But for countries outside the NUTS area, there are no regions available to select.
So, for a country inside the NUTS area (e.g. Germany BMS) there are 38 regions so would you like 38 datasets generated for Germany? Or just one dataset covering the entire scheme?
For a country outside the NUTS area (e.g. New Zealand BMS) would you like a single dataset? Or is there another way this can be broken down.
It obviously makes things simpler if it can be handled the same way globally, rather than having separate logic for NUTS vs non-NUTS areas, though it would be possible.
Countries is the first place to start but could we work towards something that uses any spatial boundary to filter the dataset. There are some scenarios where we'd want to create a dataset for a NUTS1 region, or even a bespoke boundary (accepting that we'd need to load it and index records against it).
Could there be a filter to set an area from those layers we index?
The first use case is Denmark as defined by the country layer.
The existing Darwin Core extraction script will allow you to define a filter, so yes, that could take any polygon or (preferably) indexed location ID as a filter. This will work well as it is for setting up individual exports one at a time, but I was wondering if we needed to automatically generate exports for a whole set of regions in one go. It sounds like I should work towards getting single exports set up first then we can consider any batch export after.
I've spent today working on updating the DwC extractor to support event data properly. Still a bit to do but all the principles are in place.
@DavidRoy the code is ready to extract event and occurrence data in Darwin Core Archive format, so I can set up a Denmark example. Would you like all EBMS data for Denmark, or should I limit it to certain surveys or apply any other filter to the extracted data?
@johnvanbreda thanks. Four datasets please. 118.562 118.565 118.646 118.681
The latter two might not have any data for Denmark
Perhaps we need a standard way of naming the species datasets to reflect website, survey, geographic filter?
@DavidRoy the code is now in place to create DwC-archive files. A couple of questions:
@johnvanbreda thanks for progressing this. Answer below:
@DavidRoy I've added the Denmark transects dataset to the IPT (not published) so you can check the processes and see how the metadata editing works. Another option is to provide an EML file (XML document) - I could provide a template which can be edited in a text editor. I can add the 2 timed datasets to the IPT when you are ready.
Requirement to produce separate datasets per BMS region Export in Darwin event format
follow on from #6
@johnvanbreda can you define what is required here