glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
79 stars 26 forks source link

ambiguous=IUPAC, no IUPAC in alignment #100

Closed stsmall closed 6 years ago

stsmall commented 6 years ago

Hi, I dont know if I am interpreting the option in the config file correctly, but with ambiguous=IUPAC would I expect IUPAC nucleotides in the alignment if the original FASTA files contain ambiguities? thanks, stsmall

joelarmstrong commented 6 years ago

Hi,

That part of the config file is telling LASTZ (the local alignment program we use) to treat any IUPAC ambiguous character (i.e. R, Y, D, etc.) as an N. But currently, there are other parts of the pipeline that will complain and exit out when faced with a non-N IUPAC character. So it is best by far to just replace any R, Y, etc. characters with Ns before starting the alignment: the extra information from e.g. the R character isn't used, and any non-ACTGN character will probably cause the pipeline to exit.

stsmall commented 6 years ago

OK, thank you!