debbiemarkslab / plmc

Inference of couplings in proteins and RNAs from sequence variation
MIT License
101 stars 37 forks source link

0 valid sequences #2

Closed yazhinia closed 7 years ago

yazhinia commented 7 years ago

I try to run plmc for RNA sequence. Dataset contains 11870 sequences (derived from RFAM database). Input file is fasta format. While running plmc, I always get an error saying that "0 valid sequences". What could be the reason to get such error though MSA contains divergent sequences. Please suggest me to rectify this issue.

mmagnus commented 7 years ago

Can you show the head of your alignment? Are you using - for gaps or . . I think I had a similar problem when there were - in my alignments for gaps.

And what is cmd that you are running? .AGCU?

yazhinia commented 7 years ago

ABGY01008196.1/259-62 CACCCU---U--C-U------C-G-G--C-C------U-C-U----U------------- ------------------------------G--G--C-U--A-----A-G---A-U--C- --AA------------UUU-------G------U-----A-----G-----U-----A-- -------C---C----U--G----U----U-------C------U-----U--A--U--C ---A-G----C--G-U---G----A----U---A--G-----------------------

AE014186.2/1461495-1461298 CACCCU---U--C-U------C-G-G--C-C------U-C-U----U-------------

script is plmc -a .AGUC file.fasta Sould I have to replace "-" with "."?

mmagnus commented 7 years ago

It seems that it will do the work:

plmc -a -AGUC alignment/pistol.mfa
654 valid sequences out of 654
110 sites
Effective number of samples: 94.3   (80% identical neighborhood = 1.000 samples)
iter    time    cond    fx  -loglk  ||h||   ||e||
1   1.0 19.96   7523.9  7349.8  32.8    1.3
2   1.5 15.20   7010.5  6284.1  32.9    2.7
^Z
yazhinia commented 7 years ago

Is your input file fasta format with "-" character representing gaps? If that is the case, why should I get an error?

mmagnus commented 7 years ago

Yes, plmc has to know the alphabet.

head alignment/pistol.mfa
>AMWB01035575.1/108-175
AUUCGUCAU-GGCGAAU-UAAAACAG-GGUAU-UAAGCCAUG-AGCG-GAGGAGAU--------------AA
AA--------------------AUCUCCUC-AU-UACC
>UnmappedStool_Broad_C253000097/192-264
UGUCGACCA-GGCGACA-UAAAAUA--GCCUC-UAAGCCUGG-UGCG-UGCUAUACAU------------UU
UCAC----------------AUGUAUAGCG-GC-UGGU
>RUMENNODE_4196916_1/1307-1382
CGUCGGUUU-GGCGACG-AUAAAGA--GGUUUUUAGGCCAAA-CGCG-GCAGCAUGC-------------AG
UAUCUAGA-------------GCGUGCUGC-GG-AACA
>BMHBC_2_5701378/29-103
yazhinia commented 7 years ago

0 valid sequences out of 208 229 sites Effective number of samples: 0.0 (80% identical neighborhood = 1.000 samples) Gradient optimization: No detected error Again getting same error irrespective of various input file. Very perplexed. Is it due to any installation problem?

mmagnus commented 7 years ago

Can you type the command line that you are using for this?

yazhinia commented 7 years ago

Thank you for your response. I figured out. plmc needs single line multiple sequence fasta format with gap character ".". Now it is working fine.

jingraham commented 7 years ago

Glad that this could work out and thanks for the pointers, @mmagnus! Closing for now.