Closed yazhinia closed 7 years ago
Can you show the head of your alignment? Are you using -
for gaps or .
. I think I had a similar problem when there were -
in my alignments for gaps.
And what is cmd that you are running? .AGCU?
ABGY01008196.1/259-62 CACCCU---U--C-U------C-G-G--C-C------U-C-U----U------------- ------------------------------G--G--C-U--A-----A-G---A-U--C- --AA------------UUU-------G------U-----A-----G-----U-----A-- -------C---C----U--G----U----U-------C------U-----U--A--U--C ---A-G----C--G-U---G----A----U---A--G-----------------------
AE014186.2/1461495-1461298 CACCCU---U--C-U------C-G-G--C-C------U-C-U----U-------------
script is plmc -a .AGUC file.fasta
Sould I have to replace "-" with "."?
It seems that it will do the work:
plmc -a -AGUC alignment/pistol.mfa
654 valid sequences out of 654
110 sites
Effective number of samples: 94.3 (80% identical neighborhood = 1.000 samples)
iter time cond fx -loglk ||h|| ||e||
1 1.0 19.96 7523.9 7349.8 32.8 1.3
2 1.5 15.20 7010.5 6284.1 32.9 2.7
^Z
Is your input file fasta format with "-" character representing gaps? If that is the case, why should I get an error?
Yes, plmc has to know the alphabet.
head alignment/pistol.mfa
>AMWB01035575.1/108-175
AUUCGUCAU-GGCGAAU-UAAAACAG-GGUAU-UAAGCCAUG-AGCG-GAGGAGAU--------------AA
AA--------------------AUCUCCUC-AU-UACC
>UnmappedStool_Broad_C253000097/192-264
UGUCGACCA-GGCGACA-UAAAAUA--GCCUC-UAAGCCUGG-UGCG-UGCUAUACAU------------UU
UCAC----------------AUGUAUAGCG-GC-UGGU
>RUMENNODE_4196916_1/1307-1382
CGUCGGUUU-GGCGACG-AUAAAGA--GGUUUUUAGGCCAAA-CGCG-GCAGCAUGC-------------AG
UAUCUAGA-------------GCGUGCUGC-GG-AACA
>BMHBC_2_5701378/29-103
0 valid sequences out of 208 229 sites Effective number of samples: 0.0 (80% identical neighborhood = 1.000 samples) Gradient optimization: No detected error
Again getting same error irrespective of various input file. Very perplexed. Is it due to any installation problem?
Can you type the command line that you are using for this?
Thank you for your response. I figured out. plmc needs single line multiple sequence fasta format with gap character ".". Now it is working fine.
Glad that this could work out and thanks for the pointers, @mmagnus! Closing for now.
I try to run plmc for RNA sequence. Dataset contains 11870 sequences (derived from RFAM database). Input file is fasta format. While running plmc, I always get an error saying that "0 valid sequences". What could be the reason to get such error though MSA contains divergent sequences. Please suggest me to rectify this issue.