I had this lying around for some time, but wanted top open a draft request now finally.
I added a script, containing functions, that reads in A fasta file and compute the Shannon-entropy and KL-divergence per seq based on the sequences in that file.
It always builds a dict, containing the frequencies of AAs to work with.
The frequencies in question are OVERALL and not based on alignment. This was by choice as I think its much faster and I don't think aligning multi million seqs is practicable :D
There are old commits shown as not integrated, because they where merged into one last time I think. I kept everything as is, because there are some changes in the scripts folders (unifying scripts and script.py).
I also planned to write a function that takes the UNIPROT accession from the fastas and gets the PPL metrics of AF2 from google cloud, but I did not have an example fasta.
Hi,
I had this lying around for some time, but wanted top open a draft request now finally.
I added a script, containing functions, that reads in A fasta file and compute the Shannon-entropy and KL-divergence per seq based on the sequences in that file. It always builds a dict, containing the frequencies of AAs to work with. The frequencies in question are OVERALL and not based on alignment. This was by choice as I think its much faster and I don't think aligning multi million seqs is practicable :D
There are old commits shown as not integrated, because they where merged into one last time I think. I kept everything as is, because there are some changes in the scripts folders (unifying scripts and script.py).
I also planned to write a function that takes the UNIPROT accession from the fastas and gets the PPL metrics of AF2 from google cloud, but I did not have an example fasta.
Best, Max