Closed trikiamine23 closed 4 years ago
What is the alphabets set size and the average length of the sequences?
Alphabet : 255 Average length of sequences: 12
It is likely that your system is reaching its computation limit that is causing the process to hang. The current SGT2.x has the algorithm implemented that is efficient if sequence length < alphabet size. The next version will have another algorithm implemented that is efficient for your case. It will be released in a few months. For now, I suggest to try breaking the 800k dataset into chunks and apply SGT.
Thank you very much, I will wait for the next release ! Your work is very interesting
@trikiamine23 thank you for your note!
I am having an issue with size of the dataframe. I have 800 000 different sequences. The multiprocessing works fine, but then it stops and stays with no response. Is it related to the SGT or to pandarallel ?