debbiemarkslab / plmc

Inference of couplings in proteins and RNAs from sequence variation
MIT License
98 stars 36 forks source link

protein nucleic acid coupling #22

Open kamesh2026 opened 1 month ago

kamesh2026 commented 1 month ago

Hi, I am interested in studying protein-nucleic acid couplings, like that of transcription factors and RNA binding proteins. I was looking at the plmc script and was wondering how I should go about specifiying the alphabet argument, if i would like to include both protein and DNA/RNA alphabets. I was planning to make a multiple sequence alignment, with the covarying protein and RNA alphabets concatenated.

Example

DYR_ECOLI/1-159 MISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTLNKPVIMGRHTWESIG---RPLPGRKNIILSSQPGTD--UUCUUAUCAAGAGCGGUGGAGGGAUCGGCCCAGUGAAGCCCAGCAG--CGGAGCGCAAGUUCUA----UGCUAAUUCCGACAGAAG.

Given an MSA as above containing both protein and RNA alphabets, Would the alphabet argument work "a -ACDEFGHIKLMNPQRSTVWYU" model both protein and RNA alphabets. Note that I included U to model Uracil. The other nucleotide characters A,T, C and G is shared between proteins and RNA/DNA. If the above arguement wouldn't work, would I have to change something in the plmc script to distinctly model RNA and protein alphabets, if I am interested in characterizing their co-complex.

Hope my question makes sense !

aaronkollasch commented 1 month ago

Hello, It's possible that -a -ACDEFGHIKLMNPQRSTVWYU will work for this purpose, and I would try that first.