Speeding up generation of the consensus output

skyungyong commented 2 years ago

Hi,

I have generated all the outputs from the pipelines and try to generate the final output with

reasonaTE -mode pipeline -projectFolder workspace -projectName testProject

My genome is about 1G, and this step is not finishing. I let the software run for ~ 190 hours, but it didn't produce the final outputs. I am rerunning the same job, but this seems to go very slowly. It's been about 90 hours, and this is the latest lines that the software printed:

seq1 cluster13861 Iteration 1 6 0 / 6 ... seq1 cluster46826 Iteration 1 1 0 / 1 ...

Is there a way to speed up this process? How long do you expect this process will run?

Thank you!

DerKevinRiehl commented 2 years ago

Dear Skyungyong, first of all thank you very much for your interest in using our software.

Assumptions: Your genome is around 1GBp I assume (meaning 1,000,000,000). I am assuming you have a large computation cluster.

Suggestions: Depending on your hardware setup, I guess you need to be a little more patient. Another suggestion I have: Split up the genome in different parts, and treat them as different projects. This way it can also get faster.

Background from my side: When developing the tool, I had access to a large computation cluster and annotated these genomes in parallel. In total, it took me around three weeks (meaning ~504h) where at the end I was waiting for the longer genomes only. In the following table you find genomes that I used reasonaTE with (not all are reported in the paper): grafik

Request for updates: Please let me know once you completed the job, and let me know about your experience in terms of runtime, I would be happy to know, so that I can consider that during the next update of the software.

Best regards, Kevin Riehl

skyungyong commented 2 years ago

Hi @DerKevinRiehl,

Thank you for the information! I guess I will have to let this run for quite some time then. At least now I know how long this takes, so I will stay patient! I will report once the jobs are done.

Thank you!

skyungyong commented 2 years ago

Hi @DerKevinRiehl,

The process ran for about two weeks, and there was an issue with the computer that terminated the jobs :(. The software won't restart the work from where it was stopped, right?

DerKevinRiehl commented 2 years ago

Dear Skyungyong, sorry to hear that. Unfortunately, currently there is no such option. I still hope you can restart? Are you using your own computer or a cluster? Probably you should talk to the administrator to make sure that the cluster runs more reliably.

Best regards, Kevin

skyungyong commented 2 years ago

Hi @DerKevinRiehl,

There was an unexpected outage due to the weather :(. I eventually had to split the fasta file and process each sequence separately in parallel. I think it took about a day!

DerKevinRiehl / transposon_annotation_reasonaTE

Speeding up generation of the consensus output #6