brennanaba / PLAbDab

The Patent and Literature Antibody Database (PLAbDab): an evolving reference set of functionally diverse, literature-annotated antibody sequences and structures
BSD 3-Clause "New" or "Revised" License
11 stars 6 forks source link

request for new feature, full sequence of mAb #5

Closed partrita closed 9 months ago

partrita commented 9 months ago

Thank you for sharing the great database. I have a request: is there a way to obtain the full sequence of each antibody or specifically the Fc portion?

Thanks in advances.

brennanaba commented 9 months ago

Hi Taeyoon Kim,

Thank you for your kind words about PLAbDab.

The majority of entries in PLAbDab are a curated version of data available in the NCBI database. For these cases, you can extract the full sequence using the following code. For example for the PLAbDab entry with ID AKW39254, you can extract its full sequence using the following script:

from Bio import Entrez

plabdab_ID = "AKW39254"

with Entrez.efetch(db="protein", id=plabdab_ID, rettype="gb",retmode="xml") as handle:
    entries = Entrez.read(handle)

full_sequence = entries[0]["GBSeq_sequence"].upper()

I hope that helps!!

All the best,

Brennan

partrita commented 9 months ago

Hi Taeyoon Kim,

Thank you for your kind words about PLAbDab.

The majority of entries in PLAbDab are a curated version of data available in the NCBI database. For these cases, you can extract the full sequence using the following code. For example for the PLAbDab entry with ID AKW39254, you can extract its full sequence using the following script:

from Bio import Entrez

plabdab_ID = "AKW39254"

with Entrez.efetch(db="protein", id=plabdab_ID, rettype="gb",retmode="xml") as handle:
    entries = Entrez.read(handle)

full_sequence = entries[0]["GBSeq_sequence"].upper()

I hope that helps!!

All the best,

Brennan

It works!

from Bio import Entrez

Entrez.email = 'partrita@gmail.com'
plabdab_ID = "AKW39254"

with Entrez.efetch(db="protein", id=plabdab_ID, rettype="gb",retmode="xml") as handle:
    entries = Entrez.read(handle)

full_sequence = entries[0]["GBSeq_sequence"].upper()
print(full_sequence)
MAWISLILSLLALSSGAISQAVVTQESALTTSPGETVTLTCRSSTGAVTTSNYANWVQEKPDHLFTGLIGGTNNRAPGVPARFSGSLIGDKAALTITGAQTEDEAIYFCALWYSNHWVFGGGTKLTVLGQPKSSPSVTLFPPSSEELETNKATLVCTITDFYPGVVTVDWKVDGTPVTQGMETTQPSKQSNNKYMASSYLTLTARAWERHSSYSCQVTHEGHTVEKSLSRADCS

but what if I want "Therapeutic Antibodies" sequences like a Abagovomab?

It gives me a 404 bad request from Entrez.efetch.

Thank you.