Isaiahensley / Aquatic-RIG

Our Senior Capstone project focuses on developing a Streamlit website dedicated to the visualization of aquatic NetCDF datasets. Aquatic data is inherently complex, being both spatiotemporal—capturing information over different times and space. Our website will let users alternate through time and depth to give comprehensive visuals for their data.
https://aquaticrig-develop.streamlit.app/
0 stars 1 forks source link

Compatibilityfor various NetCDF4 formats #69

Closed Isaiahensley closed 5 months ago

Isaiahensley commented 6 months ago

Description:

Currently, our dataset management page can seamlessly visualize the .nc files we have from Dr. Alam. However, we need to make sure .nc files with different variables and dimension names can be correctly identified. For example, right now our code makes several assumptions about the .nc files being uploaded. 1) The time dimension is referenced as "time" 2) The depth dimension is referenced as "depth" 3) The Latitude dimension is referenced as "lat" 4) The longitude dimension is referenced as "lon"

As shown with the aquatic .nc files Richard found, some files will have different names for these dimensions. In the .nc files we are currently testing with, Latitude is called "latitude" and so on.


Expected Outcome:

Our goal is to give the user options to find the dimension names in the uploaded files and manually set them to the correct dimension so they can be appropriately referenced in our Python script. This will make it so our website can be used for NetCDF4 files that we have not tested with and future-proof our product. We expect this will allow any aquatic .nc file to be visualized on our website.


Solution:

After a .nc file is uploaded the user will be prompted with four select boxes. Time, Depth, Latitude, and Longitude. The selections in these boxes will have all of the variable/dimension names that were found in the file. The user will need to tell it which corresponds to the correct dimension.


Code Snippet Example

Here we assume Latitude and Longitude are named "lat" and "lon" in the .nc file.

# Ensure lat and lon are 2D arrays for quiver plotting
        lat = nc_file.variables['lat'][:]
        lon = nc_file.variables['lon'][:]
        Lon, Lat = np.meshgrid(lon, lat)
Isaiahensley commented 5 months ago

Description:

I've been busy this week but went through the code a bit to elaborate further on the goal for this issue. There are several references in the code that directly call for 'time', 'depth', 'lat', and 'lon'. This will not be the case for every .nc file uploaded to our website so I will make a new page that asks the user to manually select the dimensions so we use the actual string names for each dimension. This page can also help users understand that uploaded .nc files need to have these dimensions in order to have their data visualized correctly. We can implement checks here to make sure all dimensions check out and give helper tools to explain what is required from the files.

Code Snippets:

if 'depth' in nc_file.dimensions:
            depth_dim = nc_file.dimensions['depth']
            depth_levels = len(depth_dim)  # Store the number of depth levels

time_var = nc_file.variables['time']
time_units = time_var.units
datetimes = nc.num2date(time_var[:], units=time_units)

Here is a direct example from the code where we directly reference the dimensions by assumed string names found in the .nc files. I'll instead have these set based on the users selection in earlier steps of the dataset management page.

Isaiahensley commented 5 months ago

Description:

Added a page if the user uploads files to select the correct string names associated with the Time, Depth, Latitude, and Longitude dimensions. This ensures compatibility with NetCDF4 files that use different names for these dimensions.


Code Snippets:

Changes were also made to other parts of the code where the dimension names are used. In summary, if the user chooses to upload files and not use the example dataset it takes you to a screen that asks you to choose which dimension names in the uploaded NetCDF4 file represent Time, Depth, Latitude, and Longitude.

    # -----------------------------------
    # Step 1: Dimension Selection Page (Skip if using example dataset)
    # -----------------------------------

    if st.session_state['current_step'] == 1:

        # Decide which files to use: uploaded files or example dataset files
        # If both files are uploaded and the example dataset is checked it will ignore the uploaded files
        files_to_process = None
        if st.session_state['example_dataset']:
            # Load example dataset files
            files_to_process = load_example_dataset()

            # Save dimension names we know are used in the example dataset to session state
            st.session_state['time'] = "time"
            st.session_state['depth'] = "depth"
            st.session_state['lat'] = "lat"
            st.session_state['lon'] = "lon"

            # Skip Dimension Selection Page since example dataset was selected, and we already know the dimensions
            increment_step()

        elif st.session_state['files_upload']:
            # Use the files uploaded by the user
            files_to_process = st.session_state['files_upload']

            dimensions = extract_dimensions(files_to_process)

            time_option = st.selectbox(
                "Time",
                dimensions,
                index=None,
                placeholder="Select Time Dimension",
            )

            depth_option = st.selectbox(
                "Depth",
                dimensions,
                index=None,
                placeholder="Select Depth Dimension",
            )

            lat_option = st.selectbox(
                "Latitude",
                dimensions,
                index=None,
                placeholder="Select Latitude Dimension",
            )

            lon_option = st.selectbox(
                "Longitude",
                dimensions,
                index=None,
                placeholder="Select Longitude Dimension",
            )

            # Check if none of the selections are None
            if None not in (time_option, depth_option, lat_option, lon_option):
                # Check if all selections are unique
                if len({time_option, depth_option, lat_option, lon_option}) == 4:
                    # If both conditions are met save the selections made to each dimension for later use
                    st.session_state['time'] = time_option
                    st.session_state['depth'] = depth_option
                    st.session_state['lat'] = lat_option
                    st.session_state['lon'] = lon_option

            # Create columns that let the next and back buttons display side by side.
            left, right, filler = st.columns([1, 1, 15])

            # Put Next button on the right (after condition is met)
            # This will take you to the Visualization Selection Page
            with right:
                next_button(None not in (time_option, depth_option, lat_option, lon_option) and
                            len({time_option, depth_option, lat_option, lon_option}) == 4)

            # Put Back button on the left
            # This will take you to the File Upload Page
            with left:
                back_button()

        # Save files_to_process to session state (either uploaded files or example dataset)
        st.session_state['files_to_process'] = files_to_process

Screenshot Snippets:

image

image

image