bxlab / bx-python

Tools for manipulating biological data, particularly multiple sequence alignments
MIT License
145 stars 53 forks source link

How to dealing with maf files compressed with bzip2 or lzop ? #69

Closed songtaogui closed 3 years ago

songtaogui commented 3 years ago

Hi,

I have noticed that bx-python supports maf files compressed with bzip2 or lzop. But when practicing, I have encountered some errors:

For example, I would like to build index from compressed maf file with maf_build_index.py: When dealing with bzip2, the script suggested that a bz2t should be created with bzip-table

https://github.com/bxlab/bx-python/blob/1731099ac7e358eb2eced5a02bd4c96ee3c366f0/scripts/maf_build_index.py#L32-L33

However, I could not find tool with the exact name "bzip-table", the most similar one was seek-bzip-table cmd in seek-bzip

And after generating the bz2t file with seek-bzip-table, the build index step failed with error of:

# > seek-bzip-table test.maf.bz2 >test.maf.bz2t
# > maf_build_index.py test.maf.bz2
Traceback (most recent call last):
  File "/public/home/stgui/.linuxbrew/bin/maf_build_index.py", line 85, in <module>
    main()
  File "/public/home/stgui/.linuxbrew/bin/maf_build_index.py", line 63, in main
    maf_in = TextIOWrapper(maf_in, encoding="ascii")
AttributeError: 'SeekableBzip2File' object has no attribute 'readable'

And I could not figure out how to solve this. So I just move to test the lzo compressed MAF, but I was also stucked at generating lzop table with lzop_build_offset_table.py :

# > bzip2 -dc test.maf.bz2 | lzop > test.maf.lzo
# > lzop_build_offset_table.py <test.maf.lzo >test.maf.lzot
Traceback (most recent call last):
  File "/public/home/stgui/.linuxbrew/bin/lzop_build_offset_table.py", line 98, in <module>
    main()
  File "/public/home/stgui/.linuxbrew/bin/lzop_build_offset_table.py", line 47, in main
    assert magic == MAGIC, "Not LZOP file"
AssertionError: Not LZOP file

It seems the header of my lzo file did not match with the MAGIC string of lzop_build_offset_table.py, but the test.maf.lzo file could be normally manipulated with lzop.

So how could I dealing with compressed MAF files ? My real goal is to use maf_extract_ranges_indexed.py to extract subsets from compressed maf files.

Best wishes,

Songtao Gui

songtaogui commented 3 years ago

Ops, I just found that I was running python3, after changing to python2, bz2 file works just fine.

nsoranzo commented 3 years ago

@songtaogui Python 3 is supposed to work, so this seems a bug, reopening.

nsoranzo commented 3 years ago

@songtaogui Thanks for the detailed bug report, it was very helpful! #70 should fix all the issues you mentioned.

I think bzip-table was developed on https://bitbucket.org/james_taylor/seek-bzip2 but the repo is gone after BitBucket discontinued Mercurial support in August.

dannon commented 3 years ago

I restored seek-bzip2 from a backup, converted it to git, and have created it on github as https://github.com/galaxyproject/seek-bzip2/