debbiemarkslab / EVcouplings

Evolutionary couplings from protein and RNA sequence alignments
http://evcouplings.org
Other
222 stars 73 forks source link

Fix identifier storage in Alignment class #289

Closed thomashopf closed 1 week ago

thomashopf commented 1 year ago

Currently memory usage is defined by longest identifier due to use of numpy for identifier storage, which can create a large overhead if one header is longer than others - but numpy functionality not that relevant on identifiers

Ideally, replace with pd.Series to keep slicing functionality while making use of better string memory management of pandas

@aaronkollasch

thomashopf commented 1 year ago

Also add an option to from_file method to split identifiers on first whitespace (off by default)