Open blankenberg opened 8 years ago
I think pushing the version is the right solution. Somebody just needs to find time to work on this...
-- jt
On Fri, Jul 29, 2016 at 12:41 PM, Daniel Blankenberg < notifications@github.com> wrote:
One example is the hg38 projected chr1 multiz100way from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/maf/chr1.maf.gz (5.9G gz'd, 66G gunzip'd)
maf_build_index.py chr1.maf Traceback (most recent call last): File "/path_to/bin/maf_build_index.py", line 83, in
if name == "main": main() File "/path_to/bin/maf_build_index.py", line 80, in main indexes.write( out ) File "/path_to/lib/python2.7/site-packages/bx/interval_index_file.py", line 332, in write write_packed( f, ">I", base ) File "/path_to/lib/python2.7/site-packages/bx/interval_index_file.py", line 463, in write_packed f.write( pack( pattern, *vals ) ) struct.error: 'I' format requires 0 <= number <= 4294967295 One possibility is to up the version number and store unsigned integers as unsigned long long >Q, which would max out at 18446744073709551615 vs
- Would double the packed size though.
Another potential workaround could be to break the MAF up into multiple files, but I haven't tested this.
xref: https://biostar.usegalaxy.org/p/18196/
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bxlab/bx-python/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE4ZSWKFE4vERJLatl6XcOwXGIzEhg3ks5qai1QgaJpZM4JYXhp .
Thanks for looking into this. This issue have been bothering us for several weeks. Hope to have a solution to this problem soon.
Any progress on this? I just ran into what seems to be the same problem on a large MAF file (48Gb):
(base) /mnt/e/genemod/better_dNdS_models/drosophila/11_6_2020/cactus_work$ python /home/jodyhey/miniconda3/bin/maf_build_index.py drosophila_cactus.maf drosophila_cactus.mafindex
Traceback (most recent call last):
File "/home/jodyhey/miniconda3/bin/maf_build_index.py", line 82, in
@jodyhey No one is working on this issue, sorry, but pull requests are welcome!
Is there any update on this issue? with how .maf files have gotten bigger lately, this might become a more common issue.
here I ran the script on a 30 Gb .lzo file (85 Gb uncompressed)
python3 maf_index.py
Traceback (most recent call last):
File "maf_index.py", line 75, in <module>
main()
File "maf_index.py", line 70, in main
indexes.write(out)
File "/home/pc575/jupyter-env-icelake/lib/python3.7/site-packages/bx/interval_index_file.py", line 351, in write
write_packed(f, ">I", base)
File "/home/pc575/jupyter-env-icelake/lib/python3.7/site-packages/bx/interval_index_file.py", line 486, in write_packed
f.write(pack(pattern, *vals))
struct.error: 'I' format requires 0 <= number <= 4294967295
One example is the hg38 projected chr1 multiz100way from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/maf/chr1.maf.gz (5.9G gz'd, 66G gunzip'd)
One possibility is to up the version number and store unsigned integers as unsigned long long
>Q
, which would max out at 18446744073709551615 vs 4294967295. Would double the packed size though.Another potential workaround could be to break the MAF up into multiple files, but I haven't tested this.
xref: https://biostar.usegalaxy.org/p/18196/