LivingNorway / TheDataPackage

Template for data archive structure and suggestive workflow
Creative Commons Zero v1.0 Universal
0 stars 2 forks source link

functionallity wishlist #9

Open andersfi opened 4 years ago

andersfi commented 4 years ago

wishlist

A reminder, written fast while remembering on basis of a real test example trying to dig out and documenting some old data for re-use. Please feel free to split this issue into several subsets etc.

  1. Create a basic folder structure
  2. Create "empty" template files for metadata, EML metadata (and possible also DMP). As a first step, I think there should in the future be available tools to edit those in a graphic user interface - this could probably be done by a function calling a shiny app window? For now, I suggest that creation and editing are treated separately (see also point 3)

Then, what I do is to ditch a load of old dirty files into the "raw_data" folder. Ideally, I would now describe what kind of files this is and their content, who created them etc. I think a function for expanding metadata from content semi-automatically would be useful - in the future maybe editing this by a shiny app window also?. I can always go copy-paste from "list_files()", but there is some tweaking to get a nice output in terms of sub-folder structure etc.

A.

DrMattG commented 4 years ago

Adding to your list with notes for me to help develop the function(s) [Please add if I have missed anything vital or add comments]

  1. Create a basic folder structure

This is sorted in TheDataPackage::build_folder_structure() (some weird behaviour on Macs that I need to work out)

  1. Create "empty" template files for metadata, EML metadata (and possible also DMP).

Already in place for minimum metadata, but needed for EML and DMP. I will focus on EML first as I think this will be tougher to achieve! I see this as being R coded/ in Shiny with minimal user contact directly with XML because it's a very annoying language to play with….

2a. As a first step, I think there should in the future be available tools to edit those in a graphic user interface - this could probably be done by a function calling a shiny app window? For now, I suggest that creation and editing are treated separately (see also point 3)

Agreed - will add this as goal further down the development process

  1. Then, what I do is to ditch a load of old dirty files into the "raw_data" folder. Ideally, I would now describe what kind of files this is and their content, who created them etc. I think a function for expanding metadata from content semi-automatically would be useful - in the future maybe editing this by a shiny app window also?. I can always go copy-paste from "list_files()", but there is some tweaking to get a nice output in terms of sub-folder structure etc.

I saw some code for doing this in part the other day - will locate it. I like the idea of semi-auto metadata. I am trying to develop this at the moment but might make a specific branch for this. I want this as a high(ish) priority

ErlendNilsen commented 4 years ago

Comment to point 3. above: This should be included as a new function in the package. Typically it will be called from within the metadata.rmd (and potentially the dmp-template). The function should have an argument that control if only files should be listed or files ---> field_names as well. Suggest a function something like "get_data_files(folder=)"

ErlendNilsen commented 4 years ago

There are some additional functions that could be initiated for this package: 1: Determine geographic coverage from lat-long field in the selected data file (get_geo_cov(file=...)) 2: Determine taxonomic scope from the selected data file (get_taxa_scope(file=....) 3: Get temporal coverage from selected file (get_temp_coverage(file=...). The taxa coverage function could also include some graphic display, showing the frequency of the different taxa in the data set.

They can typically be called from within the metadata.rmd for semiautomated metadata creation, but does not have to be included in the template file.

DrMattG commented 4 years ago

Agree - I was trying to automate too much with this and failing... thanks for the clarity 1) `get_geographic_extent<-function(lat,lon){ lat<-as.numeric(lat) long<-as.numeric(lon) coords<-as.data.frame(cbind(lat,long)) world <- ne_countries(scale = "medium", returnclass = "sf") ggplot(data = world) + geom_sf() + geom_point(data=coords,aes(long,lat)) }

get_geographic_extent(lat = raw_data$Latitude, lon = raw_data$Longitude) ` image

We (I) need to think about how to deal with different coordinate systems - I think I can add that to the function arguments and then as if else statements

2) Function will take input as the taxonomic name of interest to the user and just return unique values - would that work? Then word frequency plot in ggplot. I will give this a go.

3) Temporal coverage - min and max of the date/time column assigned by the user.