Closed nottwy closed 6 years ago
Hi @nottwy,
The hinge clip
step doesn't use the .las files, it only uses the .edges.hinges
and the hinge.list
files. How big are those files for you?
Typically this step is fast compared to the rest of the pipeline, so I'm surprised that it's taking that long. Is there any output to the console?
Also could you share ecoli.edges.hinges
and ecoli.hinge.list
with us? This contains no sequence information (in case that's a concern for you).
The size of edges.hinges is 113 M and of hinge.list is 27 M. There is no output so far (>1 week) and the program runs well in my view. The last record of git log is: commit 4c8b36b45878411ba25c4dae74bc99ae923b9b44 Author: Fei Xia xf1280@gmail.com Date: Tue Oct 24 17:00:36 2017 -0700
Update run.sh
Wait for your reply!
These are very large edge/hinges files. Does your genome have telomeres/centromeres? I suspect that the graph could be very dense in these repetitive parts. Do you have del_telomeres = 1;
in your nominal.ini?
Also, I remember that a while back you were trying to use the devG3 branch (see #129). Were you able to use it? Setting aggressive_pruning = true
in your nominal.ini could also help.
yeah, I still remember this thing and plan to do it. But now I just want to run your software successfully at this time. I'll give you response if I have any progress.
And if I want to try del_telomeres, which step should I start from?
Now it took me ~20 days at this step. Could you please make a little change to this step ('postprocessing') which makes the program reports the progress?
Yes, we can do that. But I think it would be helpful if we could have your edges.hinge and hinges.list files. Could you write an email to ilanshom@gmail.com, so that we can coordinate a way for you to send us the files? Thanks!
@ilanshom , Have you received my email? Wait for your reply!
Yes, thank you. We got your files and are working on it. Sorry for the delay. We'll keep you posted.
Hi @nottwy,
We made some changes to hinge clip
and it now scales to large datasets well. We were able to run it on the files you sent us in under 1 hour. Could you please checkout the latest commit, and try it again? Also, we now write many status messages so that you can tell us how far it went in case it gets stuck.
Great Work! Now it takes only 3 hours to finish the clip step.
Dear developers,
Now I'm at the 'postprocessing' step. The command is presented below. But it takes too much time.
Run postprocessing hinge clip ecoli.edges.hinges ecoli.hinge.list
My data size is: 855 Gb .las file. Do you have any suggestions?