HingeAssembler / HINGE

Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"
http://genome.cshlp.org/content/27/5/747.full.pdf+html?sid=39918b0d-7a7d-4a12-b720-9238834902fd
Other
64 stars 9 forks source link

The parameters of Module 'draft-path' #129

Open nottwy opened 7 years ago

nottwy commented 7 years ago

Dear developer,

I want to make sure some parameters that module 'draft-path' uses.

hinge draft-path $PWD $db hsy50x.G2.graphml

  1. I use the name of daligner database here. Is it correct?
  2. should I use G2 or G3?

In your manual, you write G2. But in a issue,'https://github.com/HingeAssembler/HINGE/issues/77', you write G3 there. Is there any difference?

thank you.

govinda-kamath commented 7 years ago

Yes. The name of the daligner database as $db. And you should use G2. G3 is still an experimental module for larger genomes.

nottwy commented 7 years ago

@govinda-kamath , I'm assembling a large genome. And I don't see G3 file. I run the pipeline provided in issue 'https://github.com/HingeAssembler/HINGE/issues/77', but I don't see the G3 file. Is it caused by a wrong operation?

govinda-kamath commented 7 years ago

Which commit of the software are you running?

nottwy commented 7 years ago

I install it in Jun 28,2017 with git clone. Which file should look to see the commit?

govinda-kamath commented 7 years ago
git log

should give you the commit that you installed.

But it's weird that there is no G3.

nottwy commented 7 years ago

The first few rows look like this: commit 459e3096a4bd0b6d41c6ca153dd8a2e74e7a14ad Author: Govinda Kamath govinda.kamath@gmail.com Date: Wed Jun 21 16:20:54 2017 +0530

govinda-kamath commented 7 years ago

This should actually still be producing a G3 file. Can you confirm the size of the G2 file?

nottwy commented 7 years ago

464K Sep 12 14:29 hsy50x.G2c.graphml 418K Sep 12 14:29 hsy50x.G2s.graphml 457K Sep 12 14:29 hsy50x.Gc.graphml 411K Sep 12 14:29 hsy50x.Gs.graphml 756K Sep 12 14:29 hsy50x.G1.graphml 756K Sep 12 14:29 hsy50x.G2.graphml 4.5M Sep 12 14:29 hsy50x.G0.graphml 6.7M Sep 12 14:28 hsy50x.G00.graphml

govinda-kamath commented 7 years ago

It looks like something went wrong in the clip run. G0, G1, G2, G3 are usually of similar sizes.

Can you return the STDOUT of the run of the draft-path?

nottwy commented 7 years ago

command: $hinge clip hsy.edges.hinges hsy.hinge.list 50x

output: Tue Sep 12 16:49:22 CST 2017 0 bad coverage reads. 0 bad self aligned reads. Tue Sep 12 16:50:43 CST 2017

Files generated:

756K Sep 12 16:50 hsy50x.G2.graphml 409K Sep 12 16:50 hsy50x.G2s.graphml 415K Sep 12 16:50 hsy50x.Gs.graphml 4.5M Sep 12 16:50 hsy50x.G0.graphml 756K Sep 12 16:50 hsy50x.G1.graphml 6.7M Sep 12 16:50 hsy50x.G00.graphml

govinda-kamath commented 7 years ago

Can you give us the hsy.edges.hinges and hsy.hinge.list files? These files contain no sequence information, in case you're worried about privacy.

nottwy commented 7 years ago

I can provide it to you. But when I prepare the data, I made a mistak and deleted 'hsy.hinge.list'. I know you want to rerun in your local machine so it's useless if I only provide you 'hsy.edges.hinges'.

It must be a mistake of your program. And now I want to know, is it ok if I use G2 file?

govinda-kamath commented 7 years ago

Sure. You can. Though it looks like the code crashed somewhere there (so I'm not sure about if the results will be kosher).

nottwy commented 7 years ago

it needs one or two day to get the file 'hsy.hinge.list'. And I will mail it to you after I rerun the program. Can you provide me your email?

nottwy commented 7 years ago

Another question, do you think HINGE is suited for a large genome with relative high repeat content? The size of our genome is ~1 Gb.

govinda-kamath commented 7 years ago

Please send it to gkamath@stanford.edu and feixia@stanford.edu.

On Tue, Sep 12, 2017 at 2:35 PM, Nott Yu notifications@github.com wrote:

it needs one or two day to get the file 'hsy.hinge.list'. And I will mail it to you after I rerun the program. Can you provide me your email?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HingeAssembler/HINGE/issues/129#issuecomment-328790986, or mute the thread https://github.com/notifications/unsubscribe-auth/AG-zkI2I45Bzc8nLsDk_mTquxDZRMlhNks5shkllgaJpZM4PUL_e .

govinda-kamath commented 7 years ago

How large a genome are you interested in?

On Tue, Sep 12, 2017 at 2:36 PM, Nott Yu notifications@github.com wrote:

Another question, do you think HINGE is suited for a large genome with relative high repeat content?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HingeAssembler/HINGE/issues/129#issuecomment-328791214, or mute the thread https://github.com/notifications/unsubscribe-auth/AG-zkOrl5WhXuY7F1D0hXJszqrKlShATks5shkmfgaJpZM4PUL_e .

nottwy commented 7 years ago

~1 Gb

govinda-kamath commented 7 years ago

Hinge should work for a 1 GB genome.

On Tue, Sep 12, 2017 at 2:39 PM, Nott Yu notifications@github.com wrote:

~1 Gb

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HingeAssembler/HINGE/issues/129#issuecomment-328791876, or mute the thread https://github.com/notifications/unsubscribe-auth/AG-zkNXfUwYHmDzMswrexLk0K4TKYj6oks5shko5gaJpZM4PUL_e .

nottwy commented 7 years ago

Have you received my email?

govinda-kamath commented 7 years ago

Yes. We'll get back to you soon.

ilanshom commented 7 years ago

There were a few small issues with hinge clip that were preventing it from creating the G3 graph. We created a new branch called devG3, which should fix that.

In order to have the G3 graph produced, you should make sure that aggressive_pruning = true is in your nominal.ini file (we added it in devG3), and you should call hinge clip with the path to the ini file as the fourth argument. So your command should look like:

hinge clip hsy.edges.hinges hsy.hinge.list test path-to-ini/nominal.ini

Let us know if this works.

nottwy commented 7 years ago

OK, I'll install devG3 branch and try as you suggested. I'll give you a reply as I get a result.

nottwy commented 7 years ago

Before I report my result I want to say another thing: Your tool is really difficult to compile. I'm serious. And now let's return to our topic. I haven't installed the devG3 branch successfully yet. I try to run the clip step with the old version of hinge. And it gives me an error message like this: bad coverage reads. 0 bad self aligned reads. couldn't finish sparsification10328 couldn't finish sparsification10382

I will try to install the devG3 branch and try again. And I hope this error message can help you solve other problems of hinge. I'll report the progess of devG3 in this issue later.

govinda-kamath commented 7 years ago

These are just reports on the graph about visualisation, not error messages. You should be able to continue on the pipeline.