Frost-group / Nornour

An open-source project in computational modelling and design of antimicrobrial and anticancer peptides.
0 stars 0 forks source link

Curate an initial peptide database #1

Open jarvist opened 2 months ago

jarvist commented 2 months ago

https://github.com/Frost-group/Nornour/blob/067016bd4aef46746e1a12cb2dad012116e996e2/0003-DRAMP-database/download.sh

# The DRAMP database offers perfectly formatted downloads of the data
# http://dramp.cpu-bioinfor.org/downloads/
#
# *Citation*:
# Shi G, Kang X, Dong F, Liu Y, Zhu N, Hu Y, Xu H, Lao X, Zheng H. DRAMP 3.0:
# an enhanced comprehensive data repository of antimicrobial peptides. Nucleic
# Acids Res. 2022 Jan 7;50(D1):D488-D496. PMID: 34390348

# (づ ᴗ _ ᴗ) づ ♡ - I love a good simple URL download
# wget "http://dramp.cpu-bioinfor.org/downloads/download.php?filename=download_data/DRAMP3.0_new/Antibacterial_amps.txt" -O Antibacterial_amps.txt
jarvist commented 2 months ago

Nb: Data still quite unclean! '24..spacerO()hxwlUgimvfJyqtZ' characters all turning up

jarvist commented 2 months ago

OK, this is now ready to use with a LSTM etc.; such as this super slick Javascript interface: https://cs.stanford.edu/people/karpathy/recurrentjs/

jarvist commented 2 months ago

& the RW Lexicon dataset from the paper added: https://github.com/Frost-group/Nornour/blob/0c92669f98355e71841e7aaf7adf86361cdf4575/0003b-RW-Lexicon/download.sh#L1-L8

KamDB commented 1 month ago

DRAMP ^^^ seems like a great start - 30260 entries http://dramp.cpu-bioinfor.org/downloads/ sequence, activity and haemolytic activity all in one file for antibac and anticancer peptides RW lexicon ^^^ simple but perhaps most straightforward to initially handle and model https://pubmed.ncbi.nlm.nih.gov/34021253/ 256 peptides - just arginine and tryptophan sequence, activity and haemolytic activity all in one file.

APD3 https://aps.unmc.edu/home https://academic.oup.com/nar/article/44/D1/D1087/2503090 Antimicrobial peptide database (As of today 4028 antibac peptides, 304 with anticancer activity - mostly natural) Has a file in the downloads section with one letter amino acid sequences but has no associated IC values in the file, seems like you have to manually click through the database search to find potency (IC values given for a range of different cell lines and differ between each peptide).

dbAMP 35600 entries https://awi.cuhk.edu.cn/~dbAMP/download2024.php Has sequence data in one file, can't seem to find associated IC values in files but can search on database. https://awi.cuhk.edu.cn/~dbAMP/analyze.php Has a list of various machine learning algorithms for different aspects of antimicrobial peptide discovery Hemofinder (https://awi.cuhk.edu.cn/~dbAMP/HemoFinder.php) seems particularly useful - can predict haemolytic activity and half-life of peptides.

DBAASP DBAASP offers users to search for activities of peptides by particular target species and obtain the search results as the ranking list of activity values Gram + 17049 entries, Gram - 17810 entries, cancer 3778 entries equally has property and activity calculator for peptides - https://dbaasp.org/tools?page=property-calculation https://dbaasp.org/tools?page=synergy-prediction - interestingly also has a calculator that predicts synergy between antibac peptides and conventional antibiotics.

InverPep https://ciencias.medellin.unal.edu.co/gruposdeinvestigacion/prospeccionydisenobiomoleculas/InverPep/public/home_en Specialised database of AMPs from invertebrates - 774 entries - not super useful, doesn't have any straightforwards lists but can still use to cross reference if needed .

CAMPR3 http://www.camp3.bicnirrh.res.in/index.php Again doesn't seem to have a straightforward list but has a fairly large collection of AMPs (8164 AMP sequences) - notably has a patent database (2083 entries). Also has a fair few machine learning tools for AMP prediction .

BaAMPs http://baamps.it/ Interesting to consider > In the majority of chronic infections, microorganisms are rarely found as planktonic form. Rather, they gather in biofilm communities. A biofilm is constituted of single or multiple organism species, such as fungi, bacteria, and viruses, typically attached to biotic (e.g. tissues) or abiotic sites and encased in a self-secreted extracellular matrix. The treatment for biofilm infections is particularly challenging because bacteria in these conditions become refractory to antibiotic drugs. 237 peptides but have to search through database, looks like it has no text file.

CancerPPD http://crdd.osdd.net/raghava/cancerppd/index.php 3491 peptide entries Cancer specific, has text files with one letter amino acid sequences using both natural and unnatural amino acids however no associated IC values but can search them up in the database with corresponding cell lines.

Cybase https://www.cybase.org.au/?page=assays Specific focus on cyclic peptides Small range of peptides but has sequence data as well as assay data with antibac, cancer and haemolytic activity.

Nice review on most of these antimicrobial peptide databases - https://academic.oup.com/database/article/doi/10.1093/database/baac011/6550847

Pore forming peptides - antibac and cancer https://doi.org/10.1021/acs.jmedchem.4c00912 52 sequences ran through MD to get interaction energy and some tested in various bacteria