Open InfiniGeorges opened 3 days ago
Hello, @InfiniGeorges, thank you very much for pointing this out! This definitely squashes the memory footprint and it runs in about the same amount of time. It passes my tests as well.
Since you suggested it, would you like to submit a PR?
https://github.com/IEDB/PEPMatch/blob/70d4ba9c7adb6d3dd5e20dfa55961f3299d26a7d/pepmatch/preprocessor.py#L186C1-L189C78
Dear Developer(s),
PepMatch is an incredibly useful and well-written tool—thank you for your hard work!
I've noticed that k-mer generation can lead to high memory usage with large databases. To improve this, I suggest using batch insertion and periodically clearing memory, as shown below. This can help manage memory more efficiently and avoid out-of-memory issues: