Closed rukayaj closed 1 year ago
If there is a session on GRSciColl or the new Latimer Core, we could consider contributing a talk on how we work with GRSciColl for Norwegian collections. Including the collection identifiers.
Maybe a session on the relevance of standards for data interoperability now we have seen this year quite a snowball effect with these LLMs working like magic on unstructured data?
I just asked chatgpt to combine two of our zenodo uploads of raw data files into one dataset using the darwin core standard, and it didn't do it perfectly (didn't generate unique occurrenceIDs for a start) but it tried and wasn't too bad:
Then I asked it to use UUIDs in the occurrenceIDs, and it wrote me some (valid!) python code to do it myself. I asked it if it could do it for me, and it started populating the table with random data 😅
But perhaps the idea of standards and standardisation can be a bit more fluid and human friendly now. Or perhaps not, perhaps there exists too much scope for mistakes to be made. Or maybe there needs to be some models trained just for biodiversity data. Either way I think it would make for some interesting talks and discussion.
Question
: How many scientific collections of natural history do we have in Norway?
ChatGPT
:
There are numerous scientific collections of natural history in Norway, with many located in museums, universities, and research institutions throughout the country. It is difficult to provide an exact number without more specific criteria for what qualifies as a "scientific collection," but here are a few examples:
There's an interesting article that I haven't read completely yet on the emergent properties of large language models: https://www.assemblyai.com/blog/emergent-abilities-of-large-language-models
Here is my idea written up into the format they require. Thoughts/suggestions?
GPT-3.5's suggestion: Emergent AI Contributions to Data Standardisation: Opportunities and Challenges / GPT-4's suggestion: Harnessing AI for Data Standardisation: Shaping the Future of Interoperability
Biodiversity data standardisation is an ongoing process with many challenges. Recently launched AI Large Language Models (LLMs) such as ChatGPT-4 have a new and unprecedented potential to contribute to the development/improvement of biodiversity data standards and the processes of standardisation. This is because, as LLMs are scaled, they reach critical levels where new abilities emerge suddenly and unpredictably. It is this emergent nature of LLMs which has been fuelling the exponential jumps in their usage that we have seen over the past few months. By leveraging the new power of these AI language models, we can, in ways not previously possible, analyse and understand large amounts of biodiversity data, identify patterns and relationships, and derive insights which can improve our data standards and standardisation processes.
This session aims to explore the ways in which AI technology can contribute to the development and improvement of biodiversity data standards and the processes of standardisation.
Topics of interest include:
The future role of AI in curating and developing data standards AI-based biodiversity data extraction and normalisation techniques AI-assisted biodiversity data mapping and standards alignment AI-driven metadata management and ontology development for biodiversity data Ethical and legal considerations in AI-assisted biodiversity data standardisation
Interestingly, ChatGPT wrote most of the abstract for this session after a 2 sentence prompt from the organisers :) Maybe it would be interesting and fun to have small panel discussion as well and include GPT-4 (with text to speech) as a participant?
Maybe keep the ChatGPT idea as a possible poster/lightning-talk submission? When the abstract call for TDWG 2023 eventually is launched.
The TDWG chairs just emailed us: "Every year we get one symposium that is far and away more popular in terms of numbers of abstracts submitted. This year the congratulations go to you and your symposium on Artificial Intelligence! You have 17 abstracts submitted for your session – then next most popular session has only 10." 😃
Submit a proposal by 15 April 2023 We invite you to submit a proposal (use Google form https://forms.gle/kPZXYQwfspvEKsbo8) for an organized session at TDWG 2023 reflecting the work that TDWG does and standards that contribute towards understanding and documenting biodiversity. Organized sessions can take the form of a symposium, panel discussion, or another format in which the primary purpose is to share information and engage the audience.
Each proposal requires a session title (<100 characters), a short (<200 word) abstract, and the name(s) and contact details of the session organizer(s).
Sessions may be open or closed to presentation submissions; we encourage diversity and inclusivity.
Session proposers will be responsible for soliciting and coordinating presentations, reviewing and approving abstracts for the session, and moderating the proposed session(s).
We recommend at least one session organizer should be present at the in-person meeting, if possible. Every session must have an abstract that will be published on the TDWG 2023 website.
Please note: all session organizers, along with presenters in the session, must be registered for the conference.
If your proposal is accepted, your session title and abstract will be posted on the TDWG 2023 conference website.