databio / bedbase

Aggregate, analyze, and serve genomic regions.
http://bedbase.org/
4 stars 0 forks source link

Index all BED files into vector database #41

Closed nsheff closed 5 months ago

nsheff commented 1 year ago

This should be an addition to bedboss --

Each BED file that is added to bedbase will need to have its embedding computed through a model, and then stored in the database.

nsheff commented 1 year ago

I think it would be great if @ClaudeHu could write the code for computing embedding and inserting into database, and then @khoroshevskyi can actually run it to index the bed files.

nsheff commented 1 year ago

@ClaudeHu can you update on this. Please give @khoroshevskyi the functions he needs to add to bedboss

khoroshevskyi commented 1 year ago

Update: I have added bbconf to accept qdrant and region2vec variables through bbconf config. Changes are on dev branch

nsheff commented 1 year ago

I think the functions you'll need to call in bedboss are almost finished in geniml; we can try to merge these into dev there today and then you can play with that.

@ClaudeHu can you provide a few lines of code here on how to call the new geniml functionality to do this?

khoroshevskyi commented 10 months ago

This functionality already available in bedboss. To index bedfiles, we need first upload them to the bedbase:D

khoroshevskyi commented 5 months ago

Index system is working now. You can reindex all bed files using bedboss