In FASTA parsing, sequence names were stored in a std::vector, and every new sequence name was searched versus every previous sequence name with std::find. This took O(n^2) time total, where n is the number of sequences, making the parsing extremely slow in my use case, where the index is a set of reads.
In this patch, I add an std::unordered_set of sequence names for O(1) sequence name lookup, reducing the running time to O(n).
In FASTA parsing, sequence names were stored in a std::vector, and every new sequence name was searched versus every previous sequence name with std::find. This took O(n^2) time total, where n is the number of sequences, making the parsing extremely slow in my use case, where the index is a set of reads.
In this patch, I add an std::unordered_set of sequence names for O(1) sequence name lookup, reducing the running time to O(n).