Closed rukayaj closed 1 year ago
It would be good to have a short meeting to discuss who will be attending, some logistics, and perhaps what needs to be done or prepared prior to the workshop. Is anyone available the week commencing 8th August?
Please fill in the poll below: https://doodle.com/meeting/participate/id/avgVAngb
Let's go ahead with Wednesday 10th at 10am CEST.
Could someone else please send out a zoom invite? I don't have zoom premium and try to avoid using Teams...
I made a calendar invitation with a UiO Zoom link. For 10/8 at 10:00. Feel free to share further if others want to join.
Thanks @dagendresen !
I have put together a provisional agenda for the meeting on Wednesday. If there is anything else you wish to discuss, feel free to add it. https://docs.google.com/document/d/1WfYtpzO8iXt0DrZVld8K9lA1h26IXqvxZhYFtCWbc48/edit?usp=sharing
Hi, I have spoken to the Nansen Legacy admin team about funding. We will be able to book hotels and flights for you. Will you need other expenses covered, or will GBIF cover these? E.g. buses, taxis, food.
Email from admin team:
Hi Luke,
If they also will have reimbursement for other cost as food and buses it will be best if the ordered flight – and I can book hotels for everybody.
That’s since they have to use our new DFØ to get paid.
Kind regards,
Mona
I guess she means 'they' ordered the flights.
Will you need other expenses covered, or will GBIF cover these? E.g. buses, taxis, food.
It is perfectly fine for the GBIF project to cover our other expenses! And sincerely many thanks for covering hotels and flights!!
Great, I guess this will be the simplest way. Thank you for being part of this very important workshop for our project!
Happy with these flights?
There is an earlier return flight, at 17:20, but the workshop is provisionally scheduled to end at 16:30.
Dag: Flights are all fine! @rukayaj @vidarbakken @MichalTorma ?
Hi, I would like to go home on Friday night if possible.
Hi @vidarbakken please communicate this in the email, I copied you in, so Mona can be up to date.
Hi, I would like to go home on Friday night if possible.
In principle this shouldn't be an issue, but you will probably have to pay for the extra night in the hotel yourself or if GBIF will fund this. In Nansen Legacy, usually only participants who can't fly home are granted an extra night at a hotel - which is usually us on Svalbard.
Scandic Grand is full for those dates apparently. Mona will pick something else suitable.
Some of the questions in the survey I just circulated might not be relevant to you. Just provide any answer and I will pay attention only to the relevant questions.
I will not need a hotel the last night.
I will not need a hotel the last night.
Okay @vidarbakken please communicate this and your preferred return flight with Mona via email.
@lhmarsden we were just wondering, has anybody uploaded their data for the workshop yet?
Draft Agenda as Google Doc https://docs.google.com/document/d/1ybv7gheu1inX3wVT-TwFLcIBXp2asP3tNWvFLco_iww/
@lhmarsden we were just wondering, has anybody uploaded their data for the workshop yet?
A few and a few sent them to me directly. I need to chase this up and will share with you later, but have to leave promptly now.
You can view the data shared here: https://drive.google.com/drive/folders/1DZQo-0l6Cv57wmzsDmKW4Rg2v4yNCYyh
Unfortunately, there is only one dataset shared so far, and we already discussed this one via email.
Additionally, there is a large dataset that 4-5 people will be working with together. These data already look quite a lot like an occurrrence extension, but some columns will be need to be mapped to measurementsorfacts and the event core will need to be created. I however have created a draft of the event core for them. Whilst this is a large dataset, I don't think it will be complicated to convert to DwCA.
I will chase up everyone else since I haven't received many survey responses.
I have tried to compile some information from participants concerning the workshop. See linked below:
I have not received any more datasets and don't expect to.
Some stats
Luke,
I have tried the new tool "Nansen Legacy Darwin Core Event Core and Extensions Generator". It looks good and will make it easier for researchers to convert their data. I have some questions.
Why do you use eventID and parentEventID in the input file when they (correctly) have the names occurrenceID and eventID in the DwC file? Why can't you use the same names in the input file? I also tried to rename the terms in the input file, but then the DwC file was not possible to open.
Do you expect that some researchers will have this format for their data at the workshop?
Hi @vidarbakken thanks for checking this out and providing me with feedback.
The tool pulls data from the metadata catalogue (stored as a CSV file). This table only has an eventID column, not an occurrenceID column. So it reads this column and only this column and pulls relevant metadata from the CSV for the matched record.
The metadata catalogue is a compilation of log sheets created during each cruise. In this case, the user has created an occurrenceID on their own after the cruise.
I hope a lot of the data will be in this format. However, some people have discarded the eventIDs from their data flow and worked on other sheets.
The SIOS metadata catalogue is this one? Are there spreadsheet files here in addition to the netCDF files?
The SIOS metadata catalogue is this one? Are there spreadsheet files here in addition to the netCDF files?
No, that is the data access portal/data catalogue.
The Nansen Legacy metadata catalogue is hosted here: https://sios-svalbard.org/aen/tools
And the tools including the spreadsheet generator are linked at the top right of the page.
Hi @vidarbakken thanks for checking this out and providing me with feedback.
The tool pulls data from the metadata catalogue (stored as a CSV file). This table only has an eventID column, not an occurrenceID column. So it reads this column and only this column and pulls relevant metadata from the CSV for the matched record.
The metadata catalogue is a compilation of log sheets created during each cruise. In this case, the user has created an occurrenceID on their own after the cruise.
I hope a lot of the data will be in this format. However, some people have discarded the eventIDs from their data flow and worked on other sheets.
Is the metadata cataloque (csv) available as a file?
I have a copy on my PC, though it is >100Gb. Do you want me to send you it?
I have a copy on my PC, though it is >100Gb. Do you want me to send you it?
Ok, it is so big. Is it possible to see an example of how it is formatted?
I wonder if the eventID in the metadata catalog is mapping correctly to the occurrenceID concept? A dwc:Event is the intersection of a location and a time, while a dwc:Occurrence is the intersection of a dwc:Event and a dwc:Organism. Will all occurrenceIDs actually identify some organismal presence at a time and location? Will all measurements linked to the occurrenceIDs always be measurements of the Organism present (or in other words, maybe some measurements linked to an occurrenceID might be measurements of the environment?). Maybe all fine - just checking.
I wonder if the eventID in the metadata catalog is mapping correctly to the occurrenceID concept? A dwc:Event is the intersection of a location and a time, while a dwc:Occurrence is the intersection of a dwc:Event and a dwc:Organism. Will all occurrenceIDs actually identify some organismal presence at a time and location? Will all measurements linked to the occurrenceIDs always be measurements of the Organism present (or in other words, maybe some measurements linked to an occurrenceID might be measurements of the environment?). Maybe all fine - just checking.
You are right, it won't map correctly in all cases. The user has to provide only ids that are representative of occurrences. It is not possible to implement a 'one size fits all' solution in this case, but this should help in a lot of cases. Same with the measurements.
Ok, it is so big. Is it possible to see an example of how it is formatted?
There is a pipe delimited CSV file here for 1 day of 1 cruise: https://drive.google.com/file/d/1dzWZG1HSQucLLlMfcW9Uzne8Gnz2PvVz/view?usp=sharing
Hi all, I have been trying to get in contact with @pieterprovoost to ask whether he or someone from OBIS can give a 30 minute presentation (remotely) introducing OBIS as part of the the workshop. Next Tuesday, 3pm.
Unfortunately, I haven't been able to get in touch with him. Does anyone know anyone else from OBIS who might be able to give a presentation at short notice?
Did Yi-Ming Gan of AntaBif SCAR register for the workshop? I noticed she asked for a side-discussion on your GBIF data model use case. AntaBIF and SCAR are sort of linked to OBIS.
We could also ask Abby Benson to present OBIS...? I believe she is the GBIF Node Manager for OBIS or at least for OBIS USA.
Yes, @ymgan will be joining us :)
Hi, I am the node staff for the Antarctic node of GBIF and OBIS. If the presentation is about OBIS data structure, I can do it but I can't "represent OBIS" in decision making etc.
@ymgan If you could cover this that would be great, I will not be available at the time of the workshop.
Okie! I will do it! Please let me know what should I cover :D
I think a general presentation introducing OBIS, who they are, what they do, why they're important would be good, and as you say, the data structure..
Assume that most people have either never heard of OBIS, or never published data with OBIS.
Thanks @ymgan :)
Got it! You're most welcome Luke, Pieter!
Will also briefly introduce OBIS approach to rich datasets with Event core, extended MeasurementOrFact, and BODC vocabs. Will copy @lhmarsden when I send in my slides to @pieterprovoost latest by Friday AM so that you can provide feedback before the presentation :)
Abby at tdwg just presented an obis data mobilisation workshop which was 100% virtual
https://ioos.github.io/bio_mobilization_workshop/
Some interesting techniques as they had many participants: several "themed" breakout rooms (eg 1 for help formatting dates, some one on one breakouts, a quiet room they can go to but still be connected on zoom ). Floating instructors bouncing between rooms, and slack so the instructors could ask each other for help e.g. if they need some python expert to step in or whatever
Anyway I guess we can close this now :)
Interesting - this workshop was unfunded and they won't run another again unless they get funding as it was a lot of people's free time
I'd be interested to know from Abby how successful that was. I received good feedback from a few people who attended our workshop online.
Themed break-out rooms sounds like a really good way to structure this.
Her talk generally was really interesting, I'll post it here when we get the recording later. She was a big fan of these practical hands on kind of workshops, and she said they didn't publish any datasets during the workshop but they are getting them coming in now, a bit like we are too.
I see that perhaps it's supposed to be private for conference attendees, so I will email it to you @lhmarsden rather than posting on github.
I believe the TDWG 2022 talks will eventually become openly available on the TDWG YouTube channel ... when the TDWG secretariat finds time to process and upload them to YouTube. Meanwhile, they have shared the direct Zoom link with the conference attendees :-)
Agenda (for planning meeting - 2021-12-17):
Contents of workshop
Luke asked for this workshop in order to discuss how to publish a wide range of biodiversity data.
Target attendees
Luke has discussed this in the data management group in the Nansen Legacy project: Some of the staff from several of the Norwegian data centres would also be interested in joining, as would Tove Gabrielsen, who is a biologist who co-leads the data management group and provides a researcher's perspective on things (amongst other things!). We could find more people to join based on what we thing would be useful for the workshop.
Date and location of workshop
?