ComparativeGenomicsToolkit / hal

Hierarchical Alignment Format
Other
164 stars 39 forks source link

Speeding up halLiftover #273

Closed tli71193 closed 1 year ago

tli71193 commented 1 year ago

Hi there,

I was wondering if there is anyone of speeding up halLiftover. I'm essentially converting genome to genome between species and was wondering how I might query the hal file faster.

Any advice would be much appreciated! thanks!

glennhickey commented 1 year ago

--inMemory always helps if you have enough RAM. Otherwise, you'd have to manually cut up your bed region, lift each in parallel in a separate process, then combine the result. @diekhans may be aware of some tooling to help with this?

tli71193 commented 1 year ago

what is the unit of entry for --inMemory? kb,mb, or gb? like what would be the input if i can allocate 16GB of RAM?

glennhickey commented 1 year ago

Sorry, I should have said --hdf5InMemory

--hdf5InMemory:               load all data in memory (and disable hdf5 cache) 
                              [default = 0]
tli71193 commented 1 year ago

So can I assume it’s a Boolean parameter? Rather than inputting a value.

tli71193 commented 1 year ago

I ended up having to manually cut up your bed region, lift each in parallel in a separate process, then combine the result like you said @glennhickey . It def is faster.

Storing it in memory was too much for my current setup. Thanks for all your help!