ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
503 stars 111 forks source link

ERROR: bad integers or strand in MAF (strand must be + for reference sequence) #1320

Closed chun-he-316 closed 5 months ago

chun-he-316 commented 6 months ago

Hi,

I ran multiple whole genome alignments using cactus to generate the hal file, and ran the "cactus-hal2maf js_hal2maf evolver27species.hal evolver27species.maf.gz --refGenome xxxx --chunkSize 500000" to transform the hal file. Then when I ran "phyloFit -i MAF evolver27species.maf", I met the the issue" ERROR: bad integers or strand in MAF (strand must be + for reference sequence) --". I do not know the reason.

Can you tell me how to resolve this problem? Thank you.

The best,

Chun

glennhickey commented 6 months ago

You need to use --dupeMode single to make sure there's at most one row / genome / maf block.

chun-he-316 commented 6 months ago

I have used "cactus-hal2maf js_hal2maf evolver27species.hal evolver27species.maf.gz --refGenome xxxx --chunkSize 500000 --dupeMode single", still returns "ERROR: bad integers or strand in MAF (strand must be + for reference sequence) --".Please tell me what can I do?

glennhickey commented 6 months ago

If you can find a block in evolver27species.maf.gz where the reference genome xxxx is on the negative strand then that's definitely a bug in cactus-hal2maf. If that is the case (please let me know) you can correct it with mafStrander (included in cactus). But I suspect there's maybe a naming issue between the MAF and tree you are giving to phyloFit or something like that.

Marh32 commented 6 months ago

Hi, I had the same problem, and I use mafStrander to correct it. But I find the maf file has been corrected is about 10Gb larger than original maf file. Is this normal?

glennhickey commented 5 months ago

I'm still unsure about how this problem can happen. If someone can share a hal file, cactus-hal2maf command, and block with a reverse reference in the first row, I'd very much like to try to reproduce.

Marh32 commented 5 months ago

Thank you so much for your reply. My hal file come from this URL:https://cgl.gi.ucsc.edu/data/cactus/241-mammalian-2020v2.hal. And I use halExtract --root fullTreeAnc112 241-mammalian-2020v2.hal 43primates.hal to extract the hal file of primates. Then I try to use hal2maf to convert the format, but segmentation fault occurred, so I use this command halExtract 43primates.hal 43primates.fixed.hal to extract again. Then run follow cactus-hal2maf ./jobs 43primates.fixed.hal 43primates.fixed.cactus.maf --refGenome Homo_sapiens --noAncestors --dupeMode single --chunkSize 500000 --filterGapCausingDupes to generate the maf file. Is there any problems?

Marh32 commented 5 months ago
Screenshot 2024-03-29 at 23 17 18
glennhickey commented 5 months ago

Thanks, I guess it'll take a bit but I'm rerunning this now. I'm using cactus v2.8.0 -- which version did you use?

Marh32 commented 5 months ago

Thank you so much. My cactus version is v2.7.2.

glennhickey commented 5 months ago

Thanks, I confirm that I can reproduce the problem. Will fix it asap.

Marh32 commented 5 months ago

Ok.Thank you very much. Could you please to tell me which procedure will make this error?

---- Replied Message ---- | From | Glenn @.> | | Date | 04/03/2024 21:23 | | To | ComparativeGenomicsToolkit/cactus @.> | | Cc | Marh32 @.>, Comment @.> | | Subject | Re: [ComparativeGenomicsToolkit/cactus] ERROR: bad integers or strand in MAF (strand must be + for reference sequence) (Issue #1320) |

Thanks, I confirm that I can reproduce the problem. Will fix it asap.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>