gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

20 - 22 September: Nansen Legacy - Marine data publishing workshop #73

Closed rukayaj closed 1 year ago

rukayaj commented 2 years ago

Agenda (for planning meeting - 2021-12-17):

Contents of workshop

Luke asked for this workshop in order to discuss how to publish a wide range of biodiversity data.

Target attendees

Luke has discussed this in the data management group in the Nansen Legacy project: Some of the staff from several of the Norwegian data centres would also be interested in joining, as would Tove Gabrielsen, who is a biologist who co-leads the data management group and provides a researcher's perspective on things (amongst other things!). We could find more people to join based on what we thing would be useful for the workshop.

Date and location of workshop

?

lhmarsden commented 1 year ago

It would be good to have a short meeting to discuss who will be attending, some logistics, and perhaps what needs to be done or prepared prior to the workshop. Is anyone available the week commencing 8th August?

Please fill in the poll below: https://doodle.com/meeting/participate/id/avgVAngb

lhmarsden commented 1 year ago

Let's go ahead with Wednesday 10th at 10am CEST.

Could someone else please send out a zoom invite? I don't have zoom premium and try to avoid using Teams...

dagendresen commented 1 year ago

I made a calendar invitation with a UiO Zoom link. For 10/8 at 10:00. Feel free to share further if others want to join.

lhmarsden commented 1 year ago

Thanks @dagendresen !

lhmarsden commented 1 year ago

I have put together a provisional agenda for the meeting on Wednesday. If there is anything else you wish to discuss, feel free to add it. https://docs.google.com/document/d/1WfYtpzO8iXt0DrZVld8K9lA1h26IXqvxZhYFtCWbc48/edit?usp=sharing

lhmarsden commented 1 year ago

Hi, I have spoken to the Nansen Legacy admin team about funding. We will be able to book hotels and flights for you. Will you need other expenses covered, or will GBIF cover these? E.g. buses, taxis, food.

Email from admin team:


Hi Luke,

If they also will have reimbursement for other cost as food and buses it will be best if the ordered flight – and I can book hotels for everybody.

That’s since they have to use our new DFØ to get paid.

Kind regards,

Mona


I guess she means 'they' ordered the flights.

dagendresen commented 1 year ago

Will you need other expenses covered, or will GBIF cover these? E.g. buses, taxis, food.

It is perfectly fine for the GBIF project to cover our other expenses! And sincerely many thanks for covering hotels and flights!!

lhmarsden commented 1 year ago

Great, I guess this will be the simplest way. Thank you for being part of this very important workshop for our project!

lhmarsden commented 1 year ago

Happy with these flights? image

There is an earlier return flight, at 17:20, but the workshop is provisionally scheduled to end at 16:30.

dagendresen commented 1 year ago

Dag: Flights are all fine! @rukayaj @vidarbakken @MichalTorma ?

vidarbakken commented 1 year ago

Hi, I would like to go home on Friday night if possible.

lhmarsden commented 1 year ago

Hi @vidarbakken please communicate this in the email, I copied you in, so Mona can be up to date.

lhmarsden commented 1 year ago

Hi, I would like to go home on Friday night if possible.

In principle this shouldn't be an issue, but you will probably have to pay for the extra night in the hotel yourself or if GBIF will fund this. In Nansen Legacy, usually only participants who can't fly home are granted an extra night at a hotel - which is usually us on Svalbard.

lhmarsden commented 1 year ago

Scandic Grand is full for those dates apparently. Mona will pick something else suitable.

lhmarsden commented 1 year ago

Some of the questions in the survey I just circulated might not be relevant to you. Just provide any answer and I will pay attention only to the relevant questions.

vidarbakken commented 1 year ago

I will not need a hotel the last night.

lhmarsden commented 1 year ago

I will not need a hotel the last night.

Okay @vidarbakken please communicate this and your preferred return flight with Mona via email.

rukayaj commented 1 year ago

@lhmarsden we were just wondering, has anybody uploaded their data for the workshop yet?

dagendresen commented 1 year ago

Draft Agenda as Google Doc https://docs.google.com/document/d/1ybv7gheu1inX3wVT-TwFLcIBXp2asP3tNWvFLco_iww/

lhmarsden commented 1 year ago

@lhmarsden we were just wondering, has anybody uploaded their data for the workshop yet?

A few and a few sent them to me directly. I need to chase this up and will share with you later, but have to leave promptly now.

lhmarsden commented 1 year ago

You can view the data shared here: https://drive.google.com/drive/folders/1DZQo-0l6Cv57wmzsDmKW4Rg2v4yNCYyh

Unfortunately, there is only one dataset shared so far, and we already discussed this one via email.

Additionally, there is a large dataset that 4-5 people will be working with together. These data already look quite a lot like an occurrrence extension, but some columns will be need to be mapped to measurementsorfacts and the event core will need to be created. I however have created a draft of the event core for them. Whilst this is a large dataset, I don't think it will be complicated to convert to DwCA.

I will chase up everyone else since I haven't received many survey responses.

lhmarsden commented 1 year ago

I have tried to compile some information from participants concerning the workshop. See linked below:

https://docs.google.com/spreadsheets/d/1b4pRNs1TTp5NW8b7lNVWRVzeQWX9J2Nb/edit?usp=sharing&ouid=109954185656296467293&rtpof=true&sd=true

I have not received any more datasets and don't expect to.

Some stats

vidarbakken commented 1 year ago

Luke,

I have tried the new tool "Nansen Legacy Darwin Core Event Core and Extensions Generator". It looks good and will make it easier for researchers to convert their data. I have some questions.

Why do you use eventID and parentEventID in the input file when they (correctly) have the names occurrenceID and eventID in the DwC file? Why can't you use the same names in the input file? I also tried to rename the terms in the input file, but then the DwC file was not possible to open.

Do you expect that some researchers will have this format for their data at the workshop?

lhmarsden commented 1 year ago

Hi @vidarbakken thanks for checking this out and providing me with feedback.

The tool pulls data from the metadata catalogue (stored as a CSV file). This table only has an eventID column, not an occurrenceID column. So it reads this column and only this column and pulls relevant metadata from the CSV for the matched record.

The metadata catalogue is a compilation of log sheets created during each cruise. In this case, the user has created an occurrenceID on their own after the cruise.

I hope a lot of the data will be in this format. However, some people have discarded the eventIDs from their data flow and worked on other sheets.

dagendresen commented 1 year ago

The SIOS metadata catalogue is this one? Are there spreadsheet files here in addition to the netCDF files?

lhmarsden commented 1 year ago

The SIOS metadata catalogue is this one? Are there spreadsheet files here in addition to the netCDF files?

No, that is the data access portal/data catalogue.

The Nansen Legacy metadata catalogue is hosted here: https://sios-svalbard.org/aen/tools

And the tools including the spreadsheet generator are linked at the top right of the page.

vidarbakken commented 1 year ago

Hi @vidarbakken thanks for checking this out and providing me with feedback.

The tool pulls data from the metadata catalogue (stored as a CSV file). This table only has an eventID column, not an occurrenceID column. So it reads this column and only this column and pulls relevant metadata from the CSV for the matched record.

The metadata catalogue is a compilation of log sheets created during each cruise. In this case, the user has created an occurrenceID on their own after the cruise.

I hope a lot of the data will be in this format. However, some people have discarded the eventIDs from their data flow and worked on other sheets.

Is the metadata cataloque (csv) available as a file?

lhmarsden commented 1 year ago

I have a copy on my PC, though it is >100Gb. Do you want me to send you it?

vidarbakken commented 1 year ago

I have a copy on my PC, though it is >100Gb. Do you want me to send you it?

Ok, it is so big. Is it possible to see an example of how it is formatted?

dagendresen commented 1 year ago

I wonder if the eventID in the metadata catalog is mapping correctly to the occurrenceID concept? A dwc:Event is the intersection of a location and a time, while a dwc:Occurrence is the intersection of a dwc:Event and a dwc:Organism. Will all occurrenceIDs actually identify some organismal presence at a time and location? Will all measurements linked to the occurrenceIDs always be measurements of the Organism present (or in other words, maybe some measurements linked to an occurrenceID might be measurements of the environment?). Maybe all fine - just checking.

lhmarsden commented 1 year ago

I wonder if the eventID in the metadata catalog is mapping correctly to the occurrenceID concept? A dwc:Event is the intersection of a location and a time, while a dwc:Occurrence is the intersection of a dwc:Event and a dwc:Organism. Will all occurrenceIDs actually identify some organismal presence at a time and location? Will all measurements linked to the occurrenceIDs always be measurements of the Organism present (or in other words, maybe some measurements linked to an occurrenceID might be measurements of the environment?). Maybe all fine - just checking.

You are right, it won't map correctly in all cases. The user has to provide only ids that are representative of occurrences. It is not possible to implement a 'one size fits all' solution in this case, but this should help in a lot of cases. Same with the measurements.

lhmarsden commented 1 year ago

Ok, it is so big. Is it possible to see an example of how it is formatted?

There is a pipe delimited CSV file here for 1 day of 1 cruise: https://drive.google.com/file/d/1dzWZG1HSQucLLlMfcW9Uzne8Gnz2PvVz/view?usp=sharing

lhmarsden commented 1 year ago

Hi all, I have been trying to get in contact with @pieterprovoost to ask whether he or someone from OBIS can give a 30 minute presentation (remotely) introducing OBIS as part of the the workshop. Next Tuesday, 3pm.

Unfortunately, I haven't been able to get in touch with him. Does anyone know anyone else from OBIS who might be able to give a presentation at short notice?

dagendresen commented 1 year ago

Did Yi-Ming Gan of AntaBif SCAR register for the workshop? I noticed she asked for a side-discussion on your GBIF data model use case. AntaBIF and SCAR are sort of linked to OBIS.

We could also ask Abby Benson to present OBIS...? I believe she is the GBIF Node Manager for OBIS or at least for OBIS USA.

lhmarsden commented 1 year ago

Yes, @ymgan will be joining us :)

ymgan commented 1 year ago

Hi, I am the node staff for the Antarctic node of GBIF and OBIS. If the presentation is about OBIS data structure, I can do it but I can't "represent OBIS" in decision making etc.

pieterprovoost commented 1 year ago

@ymgan If you could cover this that would be great, I will not be available at the time of the workshop.

ymgan commented 1 year ago

Okie! I will do it! Please let me know what should I cover :D

lhmarsden commented 1 year ago

I think a general presentation introducing OBIS, who they are, what they do, why they're important would be good, and as you say, the data structure..

Assume that most people have either never heard of OBIS, or never published data with OBIS.

Thanks @ymgan :)

ymgan commented 1 year ago

Got it! You're most welcome Luke, Pieter!

Will also briefly introduce OBIS approach to rich datasets with Event core, extended MeasurementOrFact, and BODC vocabs. Will copy @lhmarsden when I send in my slides to @pieterprovoost latest by Friday AM so that you can provide feedback before the presentation :)

rukayaj commented 1 year ago

Abby at tdwg just presented an obis data mobilisation workshop which was 100% virtual

https://ioos.github.io/bio_mobilization_workshop/

Some interesting techniques as they had many participants: several "themed" breakout rooms (eg 1 for help formatting dates, some one on one breakouts, a quiet room they can go to but still be connected on zoom ). Floating instructors bouncing between rooms, and slack so the instructors could ask each other for help e.g. if they need some python expert to step in or whatever

rukayaj commented 1 year ago

Anyway I guess we can close this now :)

rukayaj commented 1 year ago

Interesting - this workshop was unfunded and they won't run another again unless they get funding as it was a lot of people's free time

lhmarsden commented 1 year ago

I'd be interested to know from Abby how successful that was. I received good feedback from a few people who attended our workshop online.

Themed break-out rooms sounds like a really good way to structure this.

rukayaj commented 1 year ago

Her talk generally was really interesting, I'll post it here when we get the recording later. She was a big fan of these practical hands on kind of workshops, and she said they didn't publish any datasets during the workshop but they are getting them coming in now, a bit like we are too.

rukayaj commented 1 year ago

I see that perhaps it's supposed to be private for conference attendees, so I will email it to you @lhmarsden rather than posting on github.

dagendresen commented 1 year ago

I believe the TDWG 2022 talks will eventually become openly available on the TDWG YouTube channel ... when the TDWG secretariat finds time to process and upload them to YouTube. Meanwhile, they have shared the direct Zoom link with the conference attendees :-)