GMOD / jbrowse-components

Source code for JBrowse 2, a modern React-based genome browser
https://jbrowse.org/jb2
Apache License 2.0
196 stars 60 forks source link

features reference other features that do not exist in the file #4468

Open jasongallant opened 2 weeks ago

jasongallant commented 2 weeks ago

I'm writing in with an odd behavior-- I'm using @jbrowse-react-linear-genome-view in a web app that I'm working on, loading gff files hosted on S3.

When I load the feature track, I get the error "features reference other features that do not exist in the file".

here's the stack trace:

Error: some features reference other features that do not exist in the file (or in the same '###' scope).

/projectpath/node_modules/@gmod/gff/src/parse.ts:227:1 (at Parser._emitAllUnderConstructionFeatures () /projectpath/node_modules/@gmod/gff/src/parse.ts:165:1 (at Parser.finish () /projectpath/node_modules/@gmod/gff/src/api.ts:498:1 (at Object.parseStringSync () /projectpath/node_modules/@jbrowse/plugin-gff3/esm/Gff3TabixAdapter/Gff3TabixAdapter.js:85:1 (at Gff3TabixAdapter.getFeaturesHelper () /projectpath/node_modules/@jbrowse/plugin-gff3/esm/Gff3TabixAdapter/Gff3TabixAdapter.js:39:1 (at async)

I spent a long time examining the GFF file for issues between parent and child, but could find nothing.

I have verified this happens using both Chrome and Safari. Not sure what else to try.

cmdcolin commented 2 weeks ago

if you are able to can you send the GFF file? there are a couple reasons for this off the top of my head why this could happen but it might help to see it

can send to colin.diesh@gmail.com

jasongallant commented 2 weeks ago

Sent it along just now! Thanks for having a look.

cmdcolin commented 2 weeks ago

thanks for sending it. I believe that if you update your config to have this specific dontRedispatch line, then it should fix the issue you are seeing

{
      "type": "FeatureTrack",
      "trackId": "genes",
      "name": "genes",
      "adapter": {
        "type": "Gff3TabixAdapter",
        "dontRedispatch": ["contig", "region", "chromosome"], <-- this is the important line, specifically incorporating "contig" into the list
        "gffGzLocation": {
          "uri": "yourfile.gff.gz",
          "locationType": "UriLocation"
        },
        "index": {
          "location": {
            "uri": "yourfile.gff.gz.tbi",
            "locationType": "UriLocation"
          },
          "indexType": "TBI"
        }
      },
      "assemblyNames": ["yourasm"]
    }

just for full information about what this line means, the "dontRedispatch" field says that, with GFF3 tabix, when we request a genomic region of the file e.g. chr1:1500-1600

the response is (pseudo-gff)

chr1 gene 1000 2000 ID=mygene
chr1 exon 1550 1580 Parent=mygene
chr1 exon 1590 1650 Parent=mygene

so it is only the exons in that specific coordinate slice chr1:1500-1600 that are returned, but there may be other parts of that gene (e.g. more exons) outside of the range "chr1:1500-1600" so we "redispatch" (make anothe request against the tabix file) to the size of the largest feature in that returned results (chr1:1000-2000) which then returns the full feature. this is a heuristic though and we tell the system that we "dont redispatch" requests for features that commonly just cover the entire chromosome and never have child features like contig, chromosome, or region. this is just a tricky thing with GFF3 tabix but hope this helps! I proposed a PR to add contig to the default "dontRedispatch" set here https://github.com/GMOD/jbrowse-components/pull/4465, but applying the above config will fix it in your current version :)

jasongallant commented 2 weeks ago

Hi Colin,

Thanks for the quick attention on this, and the detailed explanation. I did wonder if this was what was going on. I tried implementing your suggestion here:

return { type: "FeatureTrack", trackId: annotation.Description, // Use a unique identifier for the trackId, assuming id is unique name: annotation.Description, // Use the name from the annotation assemblyNames: [assembly.ShortName], // Assuming you want to use the assembly's short name category: ["Annotation"], // Static category for all adapter: { type: "Gff3TabixAdapter", dontRedispatch: ["contig", "region", "chromosome"], //<-- this is the important line, specifically incorporating "contig" into the list gffGzLocation: { uri: annotationURL, locationType: "UriLocation", }, index: { location: { uri: indexURL, locationType: "UriLocation", }, }, }, };

But am still getting the same issue on this and other assemblies. it seems to happen at the particularly high zoom levels (many genes). For instance in the files that I sent you , at 0 zoom on scaffold_3, it originally loads as Zoom in to see features or force load, but when I click on force load, same problem as before.

cmdcolin commented 2 weeks ago

can you confirm that the dontRedispatch setting is active by going to the about track and showing that it is listed?

e.g. it is listed here image

if it is not listed, it may be using the default which doesn't include the "contig" entry

i'm not able to reproduce it at high levels but I do think it's not out of the question that you are seeing it still, and have a possible explanation....it's alluded to in the PR but our gff parser has a notion of a buffer size that should probably just be removed, and in that case, it will require making a new release (which I can keep you posted on :)!

image

cmdcolin commented 2 weeks ago

Now I am a bit mystified...it would be quite weird if the "gff parser bufferSize" was actually an issue in this case becuase the parseStringSync function of our Gff3TabixAdapter uses sets bufferSize to Infinity so no way it would be too small... (https://github.com/GMOD/gff-js/blob/18002e87a1d10990c463a4ee924901e9fc77e9e1/src/api.ts#L488 used by https://github.com/GMOD/jbrowse-components/blob/84736cc1c21092d735c7806b55d4214296806b36/plugins/gff3/src/Gff3TabixAdapter/Gff3TabixAdapter.ts#L139)

do you know what version of @jbrowse/react-linear-genome-view you are using? (can type yarn why @jbrowse/react-linear-genome-view to check perhaps or click the "icon in the top right" of the app)

jasongallant commented 2 weeks ago

Hi @cmdcolin can confirm that I'm using JBrowse v2.12.2 for @jbrowse/react-linear-genome-view, and that indeed Screenshot 2024-06-21 at 4 00 25 PM config.adapter.dontRedispatch is set for config region and chromosome

cmdcolin commented 2 weeks ago

very interesting...I don't have a clue yet but I'll keep brainstorming. it is funny that I can't reproduce it (even tried the URLs that you posted directly in case it was something weird with that)

jasongallant commented 2 weeks ago

Should add that I'm creating a config file using react, I'm wondering if there might be an issue with that. Here's the full code:

                      import React, { useState, useEffect } from "react";
                      import { Box } from "@mui/material";
                      import { get } from "aws-amplify/api";
                      import "@fontsource/roboto";
                      import { getUrl } from "aws-amplify/storage";
                      import {
                        createViewState,
                        JBrowseLinearGenomeView,
                      } from "@jbrowse/react-linear-genome-view";

                      async function fetchAssemblyFile(key) {
                        try {
                          console.log("Fetching URL for:", key);
                          const urlResponse = await getUrl({
                            path: key,
                            options: { validateObjectExistence: true },
                          });
                          console.log("URL Response:", urlResponse.url.href);
                          if (!urlResponse || !urlResponse.url) {
                            console.error("Failed to get a valid URL for:", key);
                            return null;
                          } else {
                            return urlResponse.url;
                          }
                        } catch (error) {
                          console.error("Error fetching URL for:", key, error);
                          return null;
                        }
                      }

                      const GenomeBrowserComponent = ({ assembly }) => {
                        const [viewState, setViewState] = useState(null);

                        console.log(assembly.CompS3Path);

                        useEffect(() => {}, [viewState]);

                        useEffect(() => {}, [assembly]);

                        useEffect(() => {
                          const fetchPresignedUrls = async () => {
                            try {
                              // Fetching the assembly data
                              const assemblyData = await fetchAssemblyFile(assembly.CompS3Path);
                              const assemblyIndex = await fetchAssemblyFile(
                                assembly.AssembyIndexPath
                              );
                              const assemblyGZI = await fetchAssemblyFile(assembly.GZIPath);

                              if (assemblyData) {
                                console.log(assembly);
                                // Create tracks dynamically based on the annotations array
                                const trackPromises = assembly.annotations.items.map(
                                  async (annotation) => {
                                    const annotationURL = await fetchAssemblyFile(
                                      annotation.CompS3Path
                                    );
                                    const indexURL = await fetchAssemblyFile(annotation.IndexPath);

                                    return {
                                      type: "FeatureTrack",
                                      trackId: annotation.Description, // Use a unique identifier for the trackId, assuming `id` is unique
                                      name: annotation.Description, // Use the name from the annotation
                                      assemblyNames: [assembly.ShortName], // Assuming you want to use the assembly's short name
                                      category: ["Annotation"], // Static category for all
                                      adapter: {
                                        type: "Gff3TabixAdapter",
                                        dontRedispatch: ["contig", "region", "chromosome"], //<-- this is the important line, specifically incorporating "contig" into the list
                                        gffGzLocation: {
                                          uri: annotationURL,
                                          locationType: "UriLocation",
                                        },
                                        index: {
                                          location: {
                                            uri: indexURL,
                                            locationType: "UriLocation",
                                          },
                                        },
                                      },
                                    };
                                  }
                                );

                                const tracks = await Promise.all(trackPromises);

                                const state = createViewState({
                                  assembly: {
                                    name: assembly.ShortName,
                                    sequence: {
                                      type: "ReferenceSequenceTrack",
                                      trackId: assembly.ShortName + "-ReferenceSequenceTrack",
                                      adapter: {
                                        type: "BgzipFastaAdapter",
                                        fastaLocation: { uri: assemblyData },
                                        faiLocation: { uri: assemblyIndex },
                                        gziLocation: { uri: assemblyGZI },
                                      },
                                    },
                                  },
                                  tracks, // Adding the dynamically created tracks array
                                });
                                setViewState(state);
                              } else {
                                throw new Error("Invalid URL data received");
                              }
                            } catch (error) {
                              console.error("Error in fetchPresignedUrls:", error);
                            }
                          };

                          fetchPresignedUrls(); // Don't forget to call the function
                        }, [assembly]); // Add other dependencies to useEffect if needed

                        if (!viewState) {
                          return <div>Loading genome data...</div>;
                        }

                        return (
                          <Box>
                            <Box sx={{ height: "10px" }} />
                            <JBrowseLinearGenomeView viewState={viewState} />
                          </Box>
                        );
                      };
                      export default GenomeBrowserComponent;
cmdcolin commented 1 week ago

I didn't mean to auto-close this issue. I just merged that one thing with dontRedispatch

I am trying to brainstorm, but i don't have too many concrete ideas.

as far as the code you posted above though, that seems probably fine. i know it's not super productive but if you want to do an office hours, might be able to live debug :) https://jbrowse.org/jb2/contact/