databio / bedbase

Aggregate, analyze, and serve genomic regions.
http://bedbase.org/
4 stars 0 forks source link

Index all BED files into vector database #41

Closed nsheff closed 7 months ago

nsheff commented 1 year ago

This should be an addition to bedboss --

Each BED file that is added to bedbase will need to have its embedding computed through a model, and then stored in the database.

nsheff commented 1 year ago

I think it would be great if @ClaudeHu could write the code for computing embedding and inserting into database, and then @khoroshevskyi can actually run it to index the bed files.

nsheff commented 1 year ago

@ClaudeHu can you update on this. Please give @khoroshevskyi the functions he needs to add to bedboss

khoroshevskyi commented 1 year ago

Update: I have added bbconf to accept qdrant and region2vec variables through bbconf config. Changes are on dev branch

nsheff commented 1 year ago

I think the functions you'll need to call in bedboss are almost finished in geniml; we can try to merge these into dev there today and then you can play with that.

@ClaudeHu can you provide a few lines of code here on how to call the new geniml functionality to do this?

khoroshevskyi commented 1 year ago

This functionality already available in bedboss. To index bedfiles, we need first upload them to the bedbase:D

khoroshevskyi commented 7 months ago

Index system is working now. You can reindex all bed files using bedboss