ComparativeGenomicsToolkit / hal

Hierarchical Alignment Format
Other
160 stars 40 forks source link

H5::DataSetIException when use halAddToBranch & halReplaceGenome #95

Open Secretloong opened 4 years ago

Secretloong commented 4 years ago

After using halAddToBranch or halReplaceGenome to add new genomes into current alignments, I always run into two types of Errors by hal2maf:

  1. HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 0:
    #000: H5Dio.c line 159 in H5Dread(): selection+offset not within extent
    major: Dataspace
    minor: Out of range
    terminate called after throwing an instance of 'H5::DataSetIException'
    Aborted
  2. Floating point exception

Does everyone have any idea for these?

diekhans commented 4 years ago

Can you give use the commands you are running along with the newick trees (from halStats)?

If you are really ambiguous, a test case as a script to reproduce the problem would be really helpful.

A command link this:

halRandGen --seed 0 --testRand test.hal

can create a test hal, which can then be diced up and put back together would save a lot of time.

Secretloong notifications@github.com writes:

After using halAddToBranch or halReplaceGenome to add new genomes into current alignments, I always run into two types of Errors by hal2maf: 1. `HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 0:

000: H5Dio.c line 159 in H5Dread(): selection+offset not within extent

major: Dataspace
minor: Out of range

terminate called after throwing an instance of 'H5::DataSetIException' Aborted 2. Floating point exception `

Does everyone have any idea for these?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/ComparativeGenomicsToolkit/hal/issues/95 After using halAddToBranch or halReplaceGenome to add new genomes into current alignments, I always run into two types of Errors by hal2maf: 1. HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 0: #000: H5Dio.c line 159 in H5Dread(): selection+offset not within extent major: Dataspace minor: Out of range terminate called after throwing an instance of 'H5::DataSetIException' Aborted 2. Floating point exception

Does everyone have any idea for these?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.*

Secretloong commented 4 years ago

Hi Mark,

The newick trees as below (part of the whole tree): existing tree: ((A:1,B:1)Anc0:1,Outgroups:1)root:1 adding tree (A,A1,A2,A3 are the different subgroups from same species, and A1,A2,A3 are new genomes): (A:1,A1:1,A2:1,A3:1)Anc_new:1 expected tree: (((A:1,A1:1,A2:1,A3:1)Anc_new:1,B:1)Anc0:1,Outgroups:1)root:1

I have tried 6 strategies:

existing alignments (existing tree): existing.hal adding alignments (adding tree): adding.hal new partial alignment (((A:1,A1:1,A2:1,A3:1)Anc_new:1,B:1)Anc0:1): new.hal top alignments ((Anc_new:1,B:1)Anc0:1): top.hal

1st: Segmentation fault. I think the reason is that adding alignments (Anc_new added as Anc0's child) should not contain one existing species as its child (A is Anc_new's child, so after adding A will be the Anc0's grandchild).

halReplaceGenome copy_of_existing.hal Anc0 --topAlignmentFile existing.hal --bottomAlignmentFile new.hal

2nd: Doesn’t work, none genome has been added. It seems that the existing alignments still need one child below the adding node to anchor the alignments.

halRemoveGenome existing.hal A
halRemoveGenome existing.hal B
halReplaceGenome copy_of_existing.hal Anc0 --topAlignmentFile existing.hal --bottomAlignmentFile new.hal

3rd:Only Anc_new been added in HAL’s “gene trees” and Floating point exception in hal2maf. I think the error in hal2maf suggests that only the direct child of adding node could be added into new HAL tree. However, the added Anc_new has 4 its own children which are not be added to HAL tree (obtain by halStats). So the HAL file wold be not validated.

halRemoveGenome existing.hal A
halReplaceGenome copy_of_existing.hal Anc0 --topAlignmentFile existing.hal --bottomAlignmentFile new.hal

4th: replacing is OK, and hal2maf works. But I don't know how to add (A,A1,A2,A3) into this new HAL alignments. Based on the former test, replacing Anc_new with (A,A1,A2,A3)Anc_new will not work.

halRemoveGenome existing.hal A
halReplaceGenome copy_of_existing.hal Anc0 --topAlignmentFile existing.hal --bottomAlignmentFile top.hal

5th: now switch to halAddToBranch. H5::DataSetIException in hal2maf. I think this problem is same as the 3rd test. So I perform the 6th test.

halAddToBranch copy_of_existing.hal adding.hal top.hal Anc0 Anc_new A A1 0.5 0.5

6th: like 5th, but add the A1,A2,A3 one by one iteratively. The adding.hal is dealt by halRemoveGenome to coincide each run. However H5::DataSetIException in hal2maf is happened in the first run.

halAddToBranch copy_of_existing.hal adding_ChildrenOnlyAandA1.hal top.hal Anc0 Anc_new A A1 0.5 0.5

Hope it makes sense. Could you find the key problem? I think the breakout is 4th test, but I can not continue to add the adding.hal to this test. it would be Segmentation fault

Thanks

diekhans commented 4 years ago

Thanks for the info. I will not be able to look at it until the middle of next week.

Secretloong commented 4 years ago

Thanks for your follow-up. There are more discoveries I would like to show you.

I have used halAppendSubtree to implement my goals.

7th: appending is OK, and hal2maf works. But halValidate is failed.

halRemoveGenome existing.hal A
halAppendSubtree copy_of_existing.hal new.hal Anc0 Anc_new --bridgeFile top.hal

8th: appending is OK, and hal2maf works. halValidate has not finished yet, but I believe it would be OK (because the test for the error region of new alignments is OK, and the validation of the intermediate result --copy_of_top.hal-- is OK).

halAppendSubtree copy_of_top.hal adding.hal Anc_new Anc_new --merge
halRemoveGenome existing.hal A
halRemoveGenome existing.hal B
halAppendSubtree copy_of_existing.hal copy_of_top.hal Anc0 Anc0 --merge

At last, I 'd like to report the core error I suppose. The halValidate to strategy 4th 6th and 7th all shows:

hal exception caught: Child 0 index 5919505 of segment 0 out of range in genome B

or

hal exception caught: Child 0 with index 800589 and start position 181968301 and sequence 31 has length 18 but parent with index 0 and start position 0 in sequence Anc0refChr2530 has length 5 

When I check all the intermediate HAL alignments, I found many sites of B have been aligned to Anc0 at the wrong place (for example above "sequence 31"). And this error happened when B (the sibling of insertion node) is present in existing HAL alignments whatever tools I used (halAddToBranch, halReplaceGenome and halAppendSubtree). Referring to your instruction, the parent, inserted node and its future siblings all should be aligned to bridge the appending or replacing. I would like to check with you that is there any error in these tools?

And the final question, I am looking forward to your help: In your new preprint paper, you demonstrated that the addition of new genomes would have problem to miss some alignments with distant outgroups. And you suggest

Re-inferring the ancestral genomes on the path from newly added genomes to the root should address this issue if it appears.

I am curious about does the "Re-inferring" mean ancestorsML (which used to re-estimates ancestral base-calls)?

Many thanks

diekhans commented 4 years ago

hi @Secretloong , would it be possible to make your HAL files for these cases available. It might make it faster to debug.

thaks

Secretloong commented 4 years ago

Sorry , I can't give you the original one. It's so huge. But I will try to extract the error parts for you. But it would be a little delay.

diekhans commented 4 years ago

Thanks; I tried a small example and couldn't reproduce it. I will try some others as well.

Secretloong notifications@github.com writes:

Sorry , I can't give you the original one. It's so huge. But I will try to extract the error parts for you. But it would be a little delay.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/ComparativeGenomicsToolkit/hal/issues/95#issuecomment-553683034 Sorry , I can't give you the original one. It's so huge. But I will try to extract the error parts for you. But it would be a little delay.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.*