fatiando / pooch

A friend to fetch your data files
https://www.fatiando.org/pooch
Other
624 stars 73 forks source link

Decompress ".Z" file (LZW compression) #263

Open rabernat opened 3 years ago

rabernat commented 3 years ago

Thanks for the amazingly useful package! ❤️

Description of the problem

I want to download a file that is compressed with the extension .Z: ftp://ftp.noc.soton.ac.uk/pub/sxj/clim/netcdf/hfns11a.nc.Z

I can download and decompress it fine on the command line using gunzip

curl -O ftp://ftp.noc.soton.ac.uk/pub/sxj/clim/netcdf/hfns11a.nc.Z
gunzip hfns11a.nc.Z

My best attempt with pooch is

fname = pooch.retrieve(
    'ftp://ftp.noc.soton.ac.uk/pub/sxj/clim/netcdf/hfns11a.nc.Z',
    known_hash='4820bce249ce508642762764fa0daa9c4785d42524fc603fe945d25140c3eaad',
    processor=pooch.Decompress(method='LZMA')
)

any other methods raise errors. However, I don't think LZMA is the right decompressor. This data is compressed using LZW.

I have figured out how to decompress it in python using the unlzw3 package.

import unlzw3

with open("hfns11a.nc.Z", "rb") as fp:
    uncompressed_data = unlzw3.unlzw(fp.read())
with open("uncompressed.nc", "wb") as fp:
    fp.write(uncompressed_data)

I have verified that this matches the results of gunzip

Would there be interest in adding this compressor to Pooch?

welcome[bot] commented 3 years ago

👋 Thanks for opening your first issue here! Please make sure you filled out the template with as much detail as possible.

You might also want to take a look at our Contributing Guide and Code of Conduct.

leouieda commented 3 years ago

@rabernat that is definitely something that would be of interest! Is this something you'd want to work on?

Since it requires an extra dependency, it would be best if it's made optional. There are plenty of examples of how to implement and test this in the code base (for example, the SFTP and TQDM support) but I'd be happy to provide more specific guidance and help.