CshlSiepelLab / RPHAST

RPHAST: Phylogenetic Analysis with Space/Time Models in R
Other
8 stars 3 forks source link

Clarification on "Ordered representation of alignment" #2

Closed lokeyCEU closed 2 years ago

lokeyCEU commented 2 years ago

I am attempting to run rphast and have run into the following error

Error in .Call.rphast("rph_phastCons", msa$externalPtr, modList, rho,  :
 ERROR: Ordered representation of alignment required.
Calls: phastCons -> phastCons.call -> .Call.rphast
Execution halted

I also got this warning message in a call before the error message above.

        In .Call.rphast("rph_msa_read", filename, format, features$externalPtr,  :
          warning: maf_read: MAF file must be sorted with respect to reference sequence if store_order=TRUE.  Ignoring out-of-order blocks

I can't find anything on this error and was hoping for some clarification. My MAF file is sorted and fits the standard formatting (created with multiz; see below). Does ordered representation here mean that the MAF should have each species ordered the same as the phylogenetic tree or?

##maf version=1 scoring=maf_project_simple
a score=-4441.0
s Chromosome22         0 31 +  123708158 TTCCACCGCTGAATTCCTCCGTCCTGATTC-------------------------------------------------C
s asm73873   1755911 31 - 1030881721 TTCCACCACTGAATTCCTCAGTCCTGCTTC-------------------------------------------------C
s catUst       628392471 31 + 1131616530 TTTCAACACCAATTCCCTCAGTTTTACTTC-------------------------------------------------C
s hirRus       541443114 49 + 1105955550 ------------------------------NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-

a score=33340.0
s Chromosome22         0 408 +  123708158 TTCCACCGCTGAATTCCTCCGTCCTGATTCCAAGAATTTCTTGAATACCACGGGGCTTTTCTCAAGATTTACCCCCACAACCTTCCCTACGTGCCCACTGCTCACCACCTCATTTTTCCCTCAATGTCTTCGTTCTCCACCCCAGAGAGTTTGTCAAGTGTTGCCTCCAAAATGTTCCCCCTCTCTCCACTACTGAGTTCCCCATTTTGCCTTCAAAATATTC$
s asm73873     475288353 410 - 1030881721 TTCCACCACTGAATTCCTCAGTCCTGCTTCCAAGGATTTCTTGAATACCACGGGGCTTTTCTCAACATTTACCCCCACGACCTTCCCTACGTGCCCACTGCTCATCACCTCATTTTTCCCTCAATGTCTTCGTTCTCCACCCCAGGGAGTTTGTCAAGTGTTGCCCCCAAAATGTTCCCCATCTGTCCACTACTGAGTTCCCCATTTTGCCTCCAAAATATTC$

a score=30578.0
s Chromosome22         0 454 +  123708158 TTCCACCGCTGAATTCCTCCGTCCTGATTCCAAGAATTTCTTGAATACCACGGGGCTTTTCTCAAGATTTACCCCCACAACCTTCCCTACGTGCCCACTGCTCACCACCTCATTTTTCCCTCAATGTCTTCGTTCTCCACCCCAGAGAGTTTGTCAAGTGTTGCCTCCAAAATGTTCCCCCTCTCTCCACTACTGAGTTCCCCATTTTGCCTTCAAAATATTC$
s asm73873     157078911 376 - 1030881721 TTCCACCACTGAATTCCTCAGTCCTGCTTCCAAGGATTTCTTGAATACCACGGGGCTTTTCTCAAGATTTACCCCCACAACCTTCCCTACGTGCCCACTGCTCATCACCTCATTTTTCCCTCAATGTGCTCGTTCTCCACCCCA----------------------------------------------------------------------------TTC$

a score=3603.0
s Chromosome22         0 45 +  123708158 TTCCACCGCTGAATTCCTCCGTCCTGATTCCAAGAATTTCTTGAA
s asm73873     142014501 45 - 1030881721 TTCCACCACTGAATTCCTCAGTCCTGCTTCCAAGGATTTCTTGAA

Thanks for any help

mjhubisz commented 2 years ago

The error means that your alignment object does not contain coordinate information. The reason for this, in your case, looks to be that the MAF contains overlapping regions of the reference sequence (Chromosome22). All three blocks of your MAF start with coordinate 0. You will need to produce a MAF file with single coverage along the reference sequence (and sorted by reference sequence coordinate). I think that should solve your issues.

lokeyCEU commented 2 years ago

Thanks Melissa I will try that and update this.

lokeyCEU commented 2 years ago

Circling back to this because I solved the issue. The issue was an embarrassingly minor bug. My code structure was as follows; align <- read.msa("msa.maf") feats <- read.feat("ref.gtf") align4d <- get4d.msa(align, feats) Tree <- ("") neutralMod <- phyloFit(align4d, tree=Tree, subst.mod="REV") pc <- phastCons(align4d, neutralMod)

The error was caused by trying to test phastCons() on align4d (the 4-fold degenerate sites only). Simply switching to phastCons(align, neutralMod) (the entire msa.maf) it runs without error (could've sworn I tested this :/ ).

Hopefully this helps anyone else who runs into this issue. Also phastCons() does allow overlapping alignment blocks. I think having overlapping blocks causes the warning mentioned in the original post.