daler / pybedtools

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")
http://daler.github.io/pybedtools
Other
297 stars 103 forks source link

Add genome arguments to BedTool.sort() #380

Closed mgperry closed 1 year ago

mgperry commented 1 year ago

Background: I ran into an issue recently when writing BigWig files from python. Since bedtools sort assumes lexographic (chr1, chr11...) sort order, this conflicts with the bigwig header, and bigwigs also require input to be sorted. This can be worked around, of course, but I thought I'd try to get the pybedtools genome args machinery to work for this case.

Changes:

Notes: I've had to change the chromsizes_to_file helper function to remove sorting, since this is imposing lexographic sort order to any given chromsizes (and also to downloaded genomes), as it was it was impossible to specify the correct sort order without manually (ie without help from pybedtools) creating the genome file. I can't think why this would cause problems in general, but it would mean the output of the tool would change in some circumstances.

I've added a quick test (which passes in isolation), however I couldn't the automatic tests to run on my machine (ModuleNotFoundError: No module named 'pybedtools.cbedtools').

Thanks for looking at this, happy to file a related issue or alter this PR as requested.

daler commented 1 year ago

Sorry it's been so long, but returning to this now I think it's a good change. I'll try merging it into the v0.9.1 branch to see what the other tests think...

Thank you for this!