GMOD / jbrowse-components

Source code for JBrowse 2, a modern React-based genome browser
https://jbrowse.org/jb2
Apache License 2.0
199 stars 61 forks source link

Support text indexing of plain text gtf files #2486

Open teresam856 opened 2 years ago

teresam856 commented 2 years ago

Support indexing of plain text gtf files.

It would be helpful to have the ability to index plain text gtf files for text searching. It would involve integrating the gmod/gtf parser.

raj1701 commented 1 year ago

Hey @teresam856 @rbuels @cmdcolin can I work on this issue?

cmdcolin commented 1 year ago

if interested can try it out. the text indexing is (mostly) part of our CLI toolkits

you can run our CLI tool by going to products/jbrowse-cli, then in that folder try bin/run --help. this runs it in dev mode from the code in the products/jbrowse-cli/src/commands/ directory. the products/jbrowse-cli/src/commands/text-index.ts is the text indexing command. GTF files are prett similar to GFF files, and you can look at the GFF3 indexing code in products/jbrowse-cli/src/types/gff3Adapter.ts, would probably make a similar one for gtf

raj1701 commented 1 year ago

Sure I will check out the implementation of gff3 and then work on gtf.

raj1701 commented 1 year ago

Hey @cmdcolin, I have implemented the index gtf file code similar to the index gff3 file code. For testing it locally I took similar file in both the formats. The .ix and .ixx files generated for both of them are same. Here are the screenshots .ix files

The command is visible on the bottom of the screen. The parsed contents of both the files are same.

The .ixx file has a single entry for both gff3 and gtf Screenshot from 2023-03-16 06 50 34

raj1701 commented 1 year ago

I have pushed only the code in my forked repository for now. Should I create a PR first for that only so that you can review the code? Or should I write a test for it? The text-index.test.ts throws errors for me in describe and expect statements.

cmdcolin commented 1 year ago

to run the tests, you can use

yarn test products/jbrowse-cli

from the root of the repo

you can certainly make a PR too, I am often in the habit of making "early" PRs to get feedback and iterate fast