gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

20 - 22 September: Nansen Legacy - Marine data publishing workshop #73

Closed rukayaj closed 1 year ago

rukayaj commented 2 years ago

Agenda (for planning meeting - 2021-12-17):

Contents of workshop

Luke asked for this workshop in order to discuss how to publish a wide range of biodiversity data.

Target attendees

Luke has discussed this in the data management group in the Nansen Legacy project: Some of the staff from several of the Norwegian data centres would also be interested in joining, as would Tove Gabrielsen, who is a biologist who co-leads the data management group and provides a researcher's perspective on things (amongst other things!). We could find more people to join based on what we thing would be useful for the workshop.

Date and location of workshop

?

dagendresen commented 2 years ago

Data flow http://www.eurobis.org/gbif http://www.eurobis.org/data_flow Overview of (all?) OBIS datasets https://www.gbif.org/network/2b7c7b4f-4d4f-40d3-94de-c28b6fa054a6

vidarbakken commented 2 years ago

Agenda: Contents of workshop It seems that the focus must be marine data. However, UNIS is also working in the terrestrial environment, but the Nansen project deals only with marine climate and ecosystem. It must be important to discuss data entry and data flow from the project to GBIF/OBIS. Important to make the process as easy as possible for the data contributors which are the researchers or technicians. Is it important that the project reports to GBIF and not OBIS?

Target attendees It is tempting to invite more research groups working mainly with the marine environment. Some obvious candidates are HI, University of Tromsø, Akvaplan-Niva and Polarinstituttet. There are also other potential candidates. Also a possibility that people can attend online, in combination with a physical meeting. Will slso depend on the corona situation in that period.

Date and location of workshop I think the best places would be Bergen or Tromsø, It would be good if the meeting could be held by HI to involve them in the project. Longyearbyen is also a possible location (UNIS). Sometime in the first part of next year.

rukayaj commented 2 years ago

We could also think about inviting our marine data publishers from the private sector I suppose? I'm thinking of DNV in particular...

dagendresen commented 2 years ago

See also

rukayaj commented 2 years ago

Rukaya's meeting notes from 2021-12-17:

Attended: @rukayaj @dagendresen @vidarbakken @MichalTorma @lhmarsden @pieterprovoost & Andreas Altenburger from UiT

Possible topics for workshop:

What data do you have which doesn't fit in with the DwC data model? How to structure your data so it fits DwC Hands on "bring your own data" session Hands on session on data workflows for each participant (maybe possible to get groupings here?) Top down - overview of the data models Bottom up - look at your own data and see how to fit it back up into the models

Logistics:

There is a NL cruise going out Feb next year, and there is an expectation that this data will get published. Note - people haven't been really sharing their data even internally up till now. Multi day workshop? In person ideally, with digital participation optional. Ideally involve marine institute + polar institute ? Tromsø is a good option? Andreas says museum can host maybe 30 people.

Funding - GBIF Norway can fund the venue + Nansen legacy project can help with funding too. Participants to pay for their own travel + hotel. Possibly we pay for PhD students flights + accom etc, as needed?

Dates - April 26 - 28? Maybe first week of May?

Participants - Perhaps this should be a "Nansen Legacy" workshop? I.e. only Nansen Legacy researchers + e.g. OBIS staff + GBIF staff who can help run the workshop, then funding is strictly through the Nansen Legacy Project.

rukayaj commented 2 years ago

Meeting minutes from Luke:

Output of discussion

A workshop will be held to teach researchers how to create Darwin Core Archives and publish their data. Researchers will bring their own data along and learn how to create a Darwin Core Archive from them (and publish it?). This will also help GBIF/OBIS identify data structures that currently are not easy to fit into a Darwin Core Archive, and therefore where further development of the standard is required.

The Nansen Legacy data policy is that data must be published to a data centre that contributes to SIOS (Data submission | sios-svalbard.org), but could also be published to GBIF/OBIS using the same DOI.

Date: Week commencing Monday 25th April, or the first week of May. Not starting on Monday to allow for travel.

Duration: 2/3 days

Location: Tromso. Luke will contact the Nansen Legacy team about the logistics of this. The University of Tromso, the Norwegian Polar Institute or the Institute of Marine Research were suggested as potential hosts. Digital participation could also be an option.

Funding: Luke will see what Nansen Legacy can contribute. GBIF may also be able to contribute.

Target audience: Nansen Legacy biologists with data

Run by: Luke Marsden, GBIF and OBIS?

AoB: Luke will have a separate discussion with GBIF and OBIS staff on the data workflow.

rukayaj commented 2 years ago

Note from Andreas: The Arctic University Museum of Norway has currently capacity for 42 workshop attendees including instructors. The technical equipment for digital participation is currently not in place. So the museum is a suboptimal venue for the workshop.

lhmarsden commented 2 years ago

Nansen Legacy workshops have to be approved by the project leader team. They are meeting next on 21st January.

NPI have suggested that they may be able to host.

What sort of technical equipment should a suitable host have? A computer classroom?

dagendresen commented 2 years ago

I think that an online personal laptop is sufficient -- for a demo and sandbox-testing of GBIF components. The GBIF data publishing software is server-side. And with no need for installing desktop software.

rukayaj commented 2 years ago

If we are planning to have a digital attendance option then some kind of streaming set up would be good...

lhmarsden commented 2 years ago

Good news, the project leadership team have discussed this and decided Nansen Legacy can fund the whole workshop.

The Polar Institute in Tromsø will be able to host.

The bad news, they have questioned whether a 3 day workshop is too long. Could someone please outline what we could achieve in 1 day Vs 2 days Vs 3 days?

I will push for 3 days if necessary.

lhmarsden commented 2 years ago

We will get 3 days if we need it, they just want to make sure it is necessary. I think it will take them a lot longer to do it by themselves!

dagendresen commented 2 years ago

Day 1 - project meeting to agree on data flow methods

Day 2 (& day 3 ?) - hands-on workshop to

Both GBIF and OBIS already have training curriculum packages prepared for 2-3-4 days, but probably too much? https://www.gbif.org/article/2IE7tH4dlcik1BnmniIPAc/training-and-e-learning

Maybe a larger one-day meeting, and a smaller hands-on workshop for completing the GBIF-publication for the first datasets? - maybe at the UiT museum?

lhmarsden commented 2 years ago

Thanks for the quick reply Dag.

Agree that they won't get into working with their own data until at least start of Day 2. The workshop will have to be very introductory. Maybe even including quick dummy examples that they can try quickly converting themselves.

The question is then whether publishing their data will take an additional 1 or 2 days. They will often come unorganised in several Excel sheets. In some cases the data will need to be divided into multiple DwCAs. In your experience, can you teach beginners to do this in 1 day following the introductory day? The goal should be that their data are published when they leave.

Since it will be a Nansen Legacy dat workshop, there will need to be some specifics. They should have all used the templates to log their metadata. Some people disregarded these templates when they have incorporated their data, which is a shame because they should have made publishing easier. Data will also need to be published first with a data centre that contributes to SIOS, then published to GBIF with the same DOI.

Of course they may have to complete the workflow afterwards but they should be 90% there and be able to complete the rest with minimal supervision. Thats the dream anyway.

dagendresen commented 2 years ago

I believe that we can bring some of the datasets very close to be ready to be published in a one-day hands-on workshop -- and some more datasets ready for publication in two days. I believe that researchers with their own data will learn very much from questions other participants will have about how to map their data.

We also have more programmed curriculum - but maybe too much?

Here is a similar data publication workshop at Fram-huset in Tromsø in 2016: https://www.gbif.no/events/2016/gbif-data-publishing-tromso.html

See also some of the other data publication workshops in 2016 here: https://www.gbif.no/events/2016/

lhmarsden commented 2 years ago

Do you mean on top of the introductory day 1?

I have looked at the previous workshops. I like the idea of day 2 (and possibly 3) being quite unstructured with instructors being there just to answer individual questions. I think this will be effective.

The introductory day 1 sounds good. We can finalise the content but something like that would be good.

Would it be an option for people to sign up to 2 or 3 days depending on how much time they want to spend or feel that they will need? Or is this a bad idea?

dagendresen commented 2 years ago

Yes, I was also thinking of planning for more participants on day 1 - to get an introduction. And to offer a day 2 (and 3?) for people who want to start publishing their own datasets - or wants to take part in watching colleagues publishing their datasets...

vidarbakken commented 2 years ago

Do you know how many participants with data you expect from the Nansen Legacy project?

lhmarsden commented 2 years ago

There are a lot who have data. Whether they will attend is another question.

We have budgeted for 30.

lhmarsden commented 2 years ago

It may also be relevent for me to mention that many people have measurements or facts that they should publish. In my experience, it can take a lot of time to reformat these into the desired format if there are lots of columns that need converting. But maybe I am slow and you have ideas on how to speed this up! :)

dagendresen commented 2 years ago

I mean to recall that the r-package dplyr can be used to script such a crosstab transformation ... Which I mean to recall we explored at an Nordic Oikos workshop in 2018 https://www.gbif.no/events/2018/Nordic-Oikos-2018-R-workshop.html https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

vidarbakken commented 2 years ago

If Dag, Ruakya, Michel and me participate, we are four teachers to help the partcipants to format their data. Including you we are five. I have some ideas how Measurement or Facts and dynamic properties data can be converted in excel.

rukayaj commented 2 years ago

There's also openrefine (http://openrefine.gbif.no/) which I think can do that as well.

lhmarsden commented 2 years ago

Thanks for the comments all. I will suggest that we go ahead with a 3 day workshop, but leave participants with the freedom to sign up to which days they want to.

pieterprovoost commented 2 years ago

Yes, two extra days after the introductory session sounds good. For some datasets processing will take less than a day, but we'll need a bit more time if we want to discuss issues that come up during the hands on work with the entire group.

We have a number of courses on data publishing at https://classroom.oceanteacher.org/. There's too much there to cover it in a single day, but we can borrow a few topics like taxon matching (if necessary) and using vocabularies with MeasurementOrFact.

We should also decide if we want to have a single opinionated workflow, or if we want to offer several options (Excel only, OpenRefine, R, etc).

rukayaj commented 2 years ago

I guess it would depend on the audience, if they have differing levels of technical skills it might be useful to offer several options?

lhmarsden commented 2 years ago

I think we can expect a mixed bag in terms of technical expertise. Most people I work with at UNIS have never published a DwCA before. Some have been working with templates using Darwin Core terms however, so some will be able to do this quickly. As long as we clearly communicate who would benefit from 2 vs 3 days, I think we are good.

I think many people should be using scientific names for their taxon, but probably not author and year. Of course reusing topics is a good idea. I think using controlled vocabularies with MeasurementOrFact will be important and will take some time.

lhmarsden commented 2 years ago

Would any of these dates work for you?

31st May - 2nd June --- (ok for: Dag, Vidar, ...) 7th June - 9th June --- (ok for: Dag, Vidar, ...)

Sorry that these are later than we previously discussed. Tove (who co-leads the data management part of the project and is a Biologist) is available on these dates but not the earlier ones. It would be good if she could attend given her position in the project. Also it would give people more notice.

vidarbakken commented 2 years ago

I'm eventually available in those periods.

lhmarsden commented 2 years ago

I will think about a possible agenda for this meeting early next week. Sorry for the delay, it has been busy.

lhmarsden commented 2 years ago

Hi all, provisional agenda here. What do you think? I am awaiting news about the venue, then we can confirm the dates. https://docs.google.com/document/d/1ybv7gheu1inX3wVT-TwFLcIBXp2asP3tNWvFLco_iww/edit?usp=sharing

Please amend what you feel needs to be amended.

rukayaj commented 2 years ago

Looks great to me, Luke!

lhmarsden commented 2 years ago

A new large seminar room at the University of Tromsø has been booked for the seminar, 31st May to 2nd June. Save the date!

There is a maximum capacity of 40.

If you are happy with the Agenda, I will neaten it up and circulate it.

lhmarsden commented 2 years ago

Unfortauntely, fewer people than I had hoped have signed up for this workshop. This could be in part because some other things are happening at the same time.

I am considering rearranging this to September. What are you availabilities for the weeks commencing Monday 12th September or week commencing 19th September?

Cheers

lhmarsden commented 2 years ago

Rearranging this would be coupled with a message from the project leadership team that more people need to be attending this workshop, so this will hopefully boost the numbers.

dagendresen commented 2 years ago

Many thanks for your efforts, Luke! I might be in Iceland for another project meeting - however, the current Doodle poll for this other meeting seems to favor late August above September - and if needs be I could prioritize the Nansen Legacy workshop!

--> my other meeting is now set to 24-26 August

vidarbakken commented 2 years ago

I can probably attend the workshop in September.

rukayaj commented 2 years ago

I think either Sept/late Aug would be fine for me too. @dagendresen I kind of remember something for BioDATA Advanced was planned for Sept, either ZA workshop or Barnaul... But I don't think any dates were finalised?

lhmarsden commented 2 years ago

I am reluctant to run it in August as I want to avoid the holiday season.

dagendresen commented 2 years ago

I kind of remember something for BioDATA Advanced was planned for Sept, either ZA workshop or Barnaul... But I don't think any dates were finalised?

Yes, there was both a BioDATA Advanced course in Barnaul and an open biodiversity science conference in Novosibirsk (5-10 Sep 2022) planned in September 2022 -- but now of course postponed because of the Russian attack on Ukraine.

dagendresen commented 2 years ago

Very good YouTube presentation of the NansenLegacy data management https://youtu.be/SbGiC5z554k https://youtu.be/muebWBUNPnA

lhmarsden commented 2 years ago

Thanks @dagendresen

ymgan commented 2 years ago

Thanks @dagendresen!! I will have a look on Tuesday when I am back to work :)

@lhmarsden - do you think it's feasible to schedule some time during the period of the workshop (maybe some time before dinner just between data managers) to look at the new gbif data model together? Maybe after the first 2 days when we have seen how the other datasets look like?

lhmarsden commented 2 years ago

That should be fine @ymgan. Perhaps we can finalise the schedule in late August and I can use it to remind people to attend.

ymgan commented 2 years ago

Awesome!! thank you so much for organising @lhmarsden ! Appreciate it!!

rukayaj commented 2 years ago

https://www.eventbrite.com/e/bring-and-publish-your-data-workshop-darwin-core-archive-tickets-267411684547?internal_ref=social

ymgan commented 1 year ago

Hello, may I know how the is the logistics organized? Do we book the hotel independently or is group reservation available? Do you have hotels recommendation? I read that Nansen Legacy would cover the costs. Do we book first and request reimbursement later? Can you please let me know?

I need to do some paperwork from my end for every work related travel and it could take long. I will be really grateful if you could provide some of the information mentioned above. Thank you so much for organizing!

lhmarsden commented 1 year ago

Hi @ymgan, sorry for the slow reply. I have been on holiday. I will find out about this as soon as possible.

lhmarsden commented 1 year ago

Update on this, the member of our admin team who controls these things is on holiday until 1st August. I will chase this up when they return to work.

lhmarsden commented 1 year ago

Lena from our Admin team was very quick in replying first thing this morning!

"Of course the project can cover the expenses of the “teachers”. It might be wise to check with Mona if we should book flights and hotel rooms for these 3-4 people, as the new reimbursement system at UiT is extremely slow in refunding external (outside Nansen Legacy) people. Mona is back next week Monday. I suggest we wait with this decision until then."

So I will give a good answer next week I hope, but I guess it might be best if we book for you.