add support for decompressing files based on filename extension before opening in `read_pdb`

Ruibin-Liu / MolDF

Super lightweight and fast mmCIF/PDB/MOL2 file parser into Pandas DataFrames and backwards writer.

https://moldf.readthedocs.io/en/latest/

MIT License

7 stars 2 forks source link

add support for decompressing files based on filename extension before opening in `read_pdb` #34

Open kamurani opened 1 month ago

kamurani commented 1 month ago

Uses filename suffix to detect which compression method to use. This way PDB files can be read that are not already decompressed.

Addresses #33

kamurani commented 1 month ago

I believe there's probably a clever way to incorporate the open_file function into read_pdbx etc.

Ruibin-Liu commented 1 month ago

Thanks for the PR! There were some issues in the github actions after I upgraded the pre-commit hooks. First it was about the tests for the Windows version but later some Ubuntu versions had that too. I spent some time investigating the problem but failed to find out (partially because I don't have a Windows machine to test locally). Are the tests passed in your local tests?

kamurani commented 1 month ago

Hey Ruibin,

This was just a quick PR I made and I didn't extensively test yet.

Not sure if related to the pre-commit hook failures; but I just realised that my code doesn't work when using .pdb.gz files when using category_names=['_atom_site'] (I was only testing with category_names=['_seq_res']) and that works).

For some reason I get an IndexError. Let me have a look and see what's wrong