Investigate integrating AI for Parcels documentation

erikvansebille commented 1 year ago

Today we had a discussion with @VeckoTheGecko, @modhurita and @RoelBrouwer about integration of AI LLMs into Parcels documentation. This Issue is to follow up on that discussion.

With the rapid emergence of new services for GPT and other LLMs, it could be interesting to explore using AI for documentation and user support for Parcels, e.g. through a chatbot on the . As @VeckoTheGecko suggested, we could look into docsgpt.ai, which seems to have free plan for open source projects.

An alternative is to wait for a Github-copilot-like service for documentation, which would then also take into account Issues, Discussions and PRs. I can imagine such a service is in the pipeline, although I haven't heard about it yet.

VeckoTheGecko commented 1 year ago

Just clarifying, I think https://docsgpt.ai/ and https://github.com/arc53/DocsGPT (website) are two different projects. The first being a commercial product, with the other being an open source project. A bit of a common name 😂.

VeckoTheGecko commented 1 year ago

OpenAI's Assistant API was just released, and looks to be a great option for this. I'm going to see if I can do some experimentation :)

modhurita commented 1 year ago

I have been experimenting with the Assistants API too (for a different project), and it does seem like OpenAI's knowledge retrieval tool is on its way to replace Retrieval Augmented Generation (RAG, which I used in the demo I showed during our meeting).

VeckoTheGecko commented 1 year ago

Did some experimentation by creating a custom GPT (https://chat.openai.com/g/g-pf75s5Rug-oceanparcels). So far it looks to be a good resource for new users who aren't as familiar with Python, as it gives the GPT a surface knowledge of the API. I haven't stress tested it (I'll leave that to you @erikvansebille if you'd like to).

Model setup

**Instructions** Your name is Guppy, and you're a GPT trained to help oceanographers use the OceanParcels Python package. This Python package allows for Lagrangian particle simulations, and is specifically created to work with large computations. In your replies, include references to xarray commands as needed (as working with NetCDF files is quite common in this domain). Your custom knowledgebase consists of the latest tutorial tutorials for the project. These tutorials are in markdown format. When referencing information from the custom knowledge base, I want you to always provide the name of the markdown file you are providing the help from at the beginning of the response. This reference should be formatted as a URL in the following format `https://docs.oceanparcels.org/en/latest/examples/{tutorial_stem}.html`. For example, the knowledgebase file `parcels_tutorial.md` will result in the following link being mentioned `https://docs.oceanparcels.org/en/latest/examples/parcels_tutorial.html`. **Knowledge** I uploaded the following files: ``` parcels_tutorial.md tutorial_jit_vs_scipy.md tutorial_kernelloop.md tutorial_interaction.md tutorial_parcels_structure.md tutorial_output.md ``` I converted the notebooks to `md` (omitting outputs) in the hope to make them more machine parsable. I kept it limited to a few tutorials to keep to the essentials (and avoiding diluting the knowledge-base. **Capabilities** - [ ] Web Browsing - [ ] DALL·E Image Generation - [ ] Code Interpreter Disabled all capabilities (Code Interpreter isn't useful as parcels wouldn't be installed anyway)

Trial conversation (unedited, first try)

> Where can I get an overview of how OceanParcels is structured? To gain an overview of how OceanParcels is structured, you should refer to the "OceanParcels Overview" tutorial. This tutorial is designed to give you a comprehensive look at the structure and functioning of the OceanParcels package. In this tutorial, you'll learn about the key components of OceanParcels, including its classes and functions, and how they work together to facilitate Lagrangian particle simulations in oceanography. The tutorial covers essential elements such as the FieldSet class, which is used to hold the grid data; the ParticleSet class, which represents a collection of particles; and the different particle classes available in OceanParcels. The tutorial also provides insights into how OceanParcels handles data, particularly NetCDF files, through the integration with xarray, which is critical for working with large datasets typically encountered in oceanographic studies. For a detailed understanding, you can access the tutorial here: [OceanParcels Overview](https://docs.oceanparcels.org/en/latest/examples/oceanparcels_overview.html). This will give you a solid foundation to start using OceanParcels effectively for your oceanographic research. > Provide me a code example that runs an OceanParcels simulation in JIT mode instantiating a line of 10 particles between lan,lot=20,30 and lan,lot=40,30, and running the simulation for 4 days in timesteps of 6 hours. Save the output in NetCDF format. For running an OceanParcels simulation in JIT (Just-In-Time) mode with a line of 10 particles between the coordinates (lat, lon) = (20, 30) and (40, 30), and simulating for 4 days with timesteps of 6 hours, while saving the output in NetCDF format, you can use the following code example. This example assumes you have a suitable fieldset (e.g., ocean currents data) loaded for the simulation. ```py from parcels import FieldSet, ParticleSet, JITParticle, AdvectionRK4 from datetime import timedelta import numpy as np # Load your fieldset here # fieldset = FieldSet.from_netcdf(...) # Define start and end points for the line of particles start_lat, start_lon = 20, 30 end_lat, end_lon = 40, 30 # Create an array of lats and lons for particles lats = np.linspace(start_lat, end_lat, 10) lons = np.full(10, start_lon) # Create a ParticleSet with a line of particles pset = ParticleSet(fieldset=fieldset, pclass=JITParticle, lon=lons, lat=lats) # Define the simulation runtime and timestep runtime = timedelta(days=4) dt = timedelta(hours=6) # Set up a Kernel for advection kernel = pset.Kernel(AdvectionRK4) # Execute the simulation pset.execute(kernel, runtime=runtime, dt=dt, output_file=pset.ParticleFile(name="output.nc", outputdt=dt)) # The output will be saved in 'output.nc' in NetCDF format ``` In this code: 1. You need to replace the fieldset loading part with your specific data source. 1. The linspace function from NumPy is used to evenly distribute 10 particles between the start and end latitudes along the same longitude. 1. ParticleSet is instantiated with these positions and the JITParticle class, which is suitable for JIT mode. 1. The simulation is run with the AdvectionRK4 kernel for 4 days, with a timestep of 6 hours. 1. The output is saved in NetCDF format in a file named "output.nc". Make sure you have the necessary OceanParcels package and dependencies installed and your environmental data (like ocean current fields) ready to use in this simulation.

Note in the conversation that it gave a broken link to the documentation. That was the first time it did that, with the 2 other times it giving valid links. Your mileage may vary.

erikvansebille commented 1 year ago

Did some experimentation by creating a custom GPT (https://chat.openai.com/g/g-pf75s5Rug-oceanparcels). So far it looks to be a good resource for new users who aren't as familiar with Python, as it gives the GPT a surface knowledge of the API. I haven't stress tested it (I'll leave that to you @erikvansebille if you'd like to).

Awesome, @VeckoTheGecko! I'd be keen to try, but following the link above seems requires a ChatGPT Plus account to use? I only have the free account at the moment... Or is there another way to access this bot without a Plus account?

VeckoTheGecko commented 1 year ago

No way to get around that unfortunately for this proof of concept. You're welcome to email me a list of prompts you'd like me to test.

If we take this further we can use the API directly

OceanParcels / Parcels

Investigate integrating AI for Parcels documentation #1432