Open nleroy917 opened 3 months ago
For python bindings, we could do an OOP approach:
from gtars.igd import Igd
igd = Igd.create_from_files(
source_files="path/to/files",
output_folder="path/to/output",
database_name="mydb"
)
# way later
igd = Igd.load_db("path/to/database)
idg.search(...)
IGD create and search now work in PR #9 with some caveats.
An IGD database can be created from a folder full of bedfiles. A search can be performed using a single bed file as the query.
Performance-wise, creation appears to be similar for C and Rust versions (80 files, ~280,000 regions) at 2.1 seconds.
There are some discrepancies between the C version that should be investigated in the future such as:
.igd
data back into memory for query. The gData tiles are not 1 to 1 (see attached picture). The creation step appears to be exactly the same based on numbers (e.g. # of Ctgs, Regions, Tiles, etc).I just merged the PR that has been in progress since beginning of the year. However, IGD still needs some work.
We need to re-implement IGD in this crate. Being done by @donaldcampbelljr in #9
Original code here: https://github.com/databio/IGD