Rostlab / SeqVec

Modelling the Language of Life - Deep Learning Protein Sequences
http://embed.protein.properties
MIT License
116 stars 13 forks source link

Re-Add npz output and add an option not to sum up the layers #6

Closed konstin closed 4 years ago

konstin commented 4 years ago

If you don't sum up per protein, using a numpy array will fail due to the different lengthes of the proteins.

konstin commented 4 years ago

I've added hdf5 support, which allows embedding bigger-than-ram datasets, and made the get_embeddings function a generator so I can do some postprocessing in my code.

sacdallago commented 4 years ago

again: would be good to have this in https://github.com/sacdallago/bio_embeddings rather than here

sacdallago commented 4 years ago

@konstin