HingeAssembler / HINGE

Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"
http://genome.cshlp.org/content/27/5/747.full.pdf+html?sid=39918b0d-7a7d-4a12-b720-9238834902fd
Other
64 stars 9 forks source link

The 'postprocessing' step is too slow. #137

Closed nottwy closed 6 years ago

nottwy commented 6 years ago

Dear developers,

Now I'm at the 'postprocessing' step. The command is presented below. But it takes too much time.


Run postprocessing hinge clip ecoli.edges.hinges ecoli.hinge.list


My data size is: 855 Gb .las file. Do you have any suggestions?

ilanshom commented 6 years ago

Hi @nottwy,

The hinge clip step doesn't use the .las files, it only uses the .edges.hinges and the hinge.list files. How big are those files for you?

Typically this step is fast compared to the rest of the pipeline, so I'm surprised that it's taking that long. Is there any output to the console?

govinda-kamath commented 6 years ago

Also could you share ecoli.edges.hinges and ecoli.hinge.list with us? This contains no sequence information (in case that's a concern for you).

nottwy commented 6 years ago

The size of edges.hinges is 113 M and of hinge.list is 27 M. There is no output so far (>1 week) and the program runs well in my view. The last record of git log is: commit 4c8b36b45878411ba25c4dae74bc99ae923b9b44 Author: Fei Xia xf1280@gmail.com Date: Tue Oct 24 17:00:36 2017 -0700

Update run.sh

Wait for your reply!

ilanshom commented 6 years ago

These are very large edge/hinges files. Does your genome have telomeres/centromeres? I suspect that the graph could be very dense in these repetitive parts. Do you have del_telomeres = 1; in your nominal.ini?

Also, I remember that a while back you were trying to use the devG3 branch (see #129). Were you able to use it? Setting aggressive_pruning = true in your nominal.ini could also help.

nottwy commented 6 years ago

yeah, I still remember this thing and plan to do it. But now I just want to run your software successfully at this time. I'll give you response if I have any progress.

And if I want to try del_telomeres, which step should I start from?

nottwy commented 6 years ago

Now it took me ~20 days at this step. Could you please make a little change to this step ('postprocessing') which makes the program reports the progress?

ilanshom commented 6 years ago

Yes, we can do that. But I think it would be helpful if we could have your edges.hinge and hinges.list files. Could you write an email to ilanshom@gmail.com, so that we can coordinate a way for you to send us the files? Thanks!

nottwy commented 6 years ago

@ilanshom , Have you received my email? Wait for your reply!

ilanshom commented 6 years ago

Yes, thank you. We got your files and are working on it. Sorry for the delay. We'll keep you posted.

ilanshom commented 6 years ago

Hi @nottwy, We made some changes to hinge clip and it now scales to large datasets well. We were able to run it on the files you sent us in under 1 hour. Could you please checkout the latest commit, and try it again? Also, we now write many status messages so that you can tell us how far it went in case it gets stuck.

nottwy commented 6 years ago

Great Work! Now it takes only 3 hours to finish the clip step.