Closed jon-xu closed 2 years ago
Hi Jon,
Thanks for using tailfindr.
Sorry to hear about performance issues. Unfortunately it gets slow for big datasets at the moment. I would recommend splitting the big dataset in lets say four folders and then applying tailfindr individually on these folders. Not an optimal solution, I understand. Sorry for that. It is in my todo list to improve performance for large datasets.
Best, Adnan
Have you considered adapting tailfindr to work with SLOW5 files? Nanopolish recently added support for this and it improves performance substantially.
It's on my to do list, when I get time for it.
Thanks Adnan!
We ended up with subsampled the fast5 files.
Will try slow5 when have a chance!
Cheers,
Jon
On 2 Jul 2022, at 01:11, Adnan Niazi @.***> wrote:
It's on my to do list, when I get time for it.
— Reply to this email directly, view it on GitHubhttps://github.com/adnaniazi/tailfindr/issues/29#issuecomment-1172448943, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJXMHDBUTZMLXKAZT2WB7C3VR4DALANCNFSM5XLYZGRA. You are receiving this because you authored the thread.Message ID: @.***>
Thanks Adnan!
It’s still manageable - I’ll just let it run and get you updated!
Cheers,
Jon
On May 31, 2022, at 16:30, Adnan Niazi @.***> wrote:
Hi Jon,
Thanks for using tailfindr.
Sorry to hear about performance issues. Unfortunately it gets slow for big datasets at the moment. I would recommend splitting the big dataset in lets say four folders and then applying tailfindr individually on these folders. Not an optimal solution, I understand. Sorry for that. It is in my todo list to improve performance for large datasets.
Best, Adnan
— Reply to this email directly, view it on GitHubhttps://github.com/adnaniazi/tailfindr/issues/29#issuecomment-1141717675, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJXMHDG2ECI3CP6WTOF7H7DVMWWXFANCNFSM5XLYZGRA. You are receiving this because you authored the thread.Message ID: @.***>
Hi Adnan,
Thanks again for the tool! I have been able to detect the polyA tails with a few datasets. Now I am working on a relatively large cDNA dataset, which is about 350GB in size.
I have tried to increase the num_cores from 16 to 32 and 64, with 20GB per core. But I didn't notice big improvement in speed, which means it will need more than 160 hours for our datasets based on the progress in the log file.
Do you have any advice on making it even faster, please? Thanks! Jon