The algorithm does not scale well especially in the singleton_finder. Luckily this part of the code could be easily multithreaded and some heuristics could be added to improve speed.
At this point in the program we have found all of our patterns but we haven't clustered anything. If we clustered we could remove repeats which were perfect substrings of other repeats, since many of these substrings would be caused by partial matching repeats. This would reduce the number of patterns that would need to be searched against.
We could also multithread the function, since we never modify the pattern list or the read that is being searched against. We could split the pattern list into as many parts as there are threads, then make a separate tree for each thread and then pass the read in to all threads. We would just need to make sure that only one thread is allowed to add in a read into the primary structure and we deal with instances where two or more threads return a match for the same read.
The algorithm does not scale well especially in the singleton_finder. Luckily this part of the code could be easily multithreaded and some heuristics could be added to improve speed.
At this point in the program we have found all of our patterns but we haven't clustered anything. If we clustered we could remove repeats which were perfect substrings of other repeats, since many of these substrings would be caused by partial matching repeats. This would reduce the number of patterns that would need to be searched against.
We could also multithread the function, since we never modify the pattern list or the read that is being searched against. We could split the pattern list into as many parts as there are threads, then make a separate tree for each thread and then pass the read in to all threads. We would just need to make sure that only one thread is allowed to add in a read into the primary structure and we deal with instances where two or more threads return a match for the same read.