debbiemarkslab / plmc

Inference of couplings in proteins and RNAs from sequence variation
MIT License
104 stars 38 forks source link

protein nucleic acid coupling #22

Open kamesh2026 opened 4 months ago

kamesh2026 commented 4 months ago

Hi, I am interested in studying protein-nucleic acid couplings, like that of transcription factors and RNA binding proteins. I was looking at the plmc script and was wondering how I should go about specifiying the alphabet argument, if i would like to include both protein and DNA/RNA alphabets. I was planning to make a multiple sequence alignment, with the covarying protein and RNA alphabets concatenated.

Example

DYR_ECOLI/1-159 MISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTLNKPVIMGRHTWESIG---RPLPGRKNIILSSQPGTD--UUCUUAUCAAGAGCGGUGGAGGGAUCGGCCCAGUGAAGCCCAGCAG--CGGAGCGCAAGUUCUA----UGCUAAUUCCGACAGAAG.

Given an MSA as above containing both protein and RNA alphabets, Would the alphabet argument work "a -ACDEFGHIKLMNPQRSTVWYU" model both protein and RNA alphabets. Note that I included U to model Uracil. The other nucleotide characters A,T, C and G is shared between proteins and RNA/DNA. If the above arguement wouldn't work, would I have to change something in the plmc script to distinctly model RNA and protein alphabets, if I am interested in characterizing their co-complex.

Hope my question makes sense !

aaronkollasch commented 4 months ago

Hello, It's possible that -a -ACDEFGHIKLMNPQRSTVWYU will work for this purpose, and I would try that first.