brennanaba / PLAbDab

The Patent and Literature Antibody Database (PLAbDab): an evolving reference set of functionally diverse, literature-annotated antibody sequences and structures
BSD 3-Clause "New" or "Revised" License
11 stars 6 forks source link

Code used to create PLAbDab available? #7

Closed shatz01 closed 5 months ago

shatz01 commented 5 months ago

Hi first off, thanks for the amazing database!

It seems like this repo only contains code to query the already made database right?

Is the code used to create this from querying NCBI via Entrez available anywhere?

Thanks!

shatz01 commented 5 months ago

?

brennanaba commented 5 months ago

Hi Daniel,

Thank you for your kind words about PLAbDab.

You are right, this repo only contains the code to query the database.

The code used to create the database is not currently publicly available. However, to extract data from the NCBI database using Entrez you can use the following code:

from Bio import Entrez

plabdab_ID = "AKW39254"

with Entrez.efetch(db="protein", id=plabdab_ID, rettype="gb",retmode="xml") as handle:
    entries = Entrez.read(handle)

sequence = entries[0]["GBSeq_sequence"].upper()

I hope that helps!!

All the best,

Brennan

shatz01 commented 5 months ago

Gotcha, thanks! A bit sad that the code to create the database is not public, but its ok.

What is plabdab_ID? Does it literally correspond to the same plabdab database that we can query using this repo?

brennanaba commented 5 months ago

Hi Daniel,

For entries scraped from the NCBI database, the plabdab ID will correspond to the LOCUS code used in NCBI (or a combination of them if the heavy and light chains come from a different NCBI entry). For example, the entry with plabdab ID QTW11010 was scraped from here: https://www.ncbi.nlm.nih.gov/protein/QTW11010.

All the best,

Brennan