brennanaba / PLAbDab

The Patent and Literature Antibody Database (PLAbDab): an evolving reference set of functionally diverse, literature-annotated antibody sequences and structures
BSD 3-Clause "New" or "Revised" License
11 stars 6 forks source link

Any plan for provide fc sequnce? #6

Closed partrita closed 8 months ago

partrita commented 8 months ago

Are there any plans to provide information on the Fc sequence of antibodies? Recently, engineered antibodies with modified Fc regions have also been approved as pharmaceuticals. Providing sequence information on various aspects of antibodies would likely be helpful for research purposes.

Previously, a Biopython script to obtain sequence information using the PlabDab_ID available on NCBI. But I couldn't retrieve information for antibodies like Abagovomab based on their INN names.

Thank you in advance.

brennanaba commented 8 months ago

Hi Taeyoon,

For PLAbDab, we scrape therapeutic antibodies from Thera-SAbDab. Thera-SAbDab is a manually curated database of therapeutic antibodies (more info on it here). As Thera-SAbDab does not contain the Fc sequence, we can not scrape it from there.

I am not aware of an automated way to extract the Fc sequence for these entries. If you have any suggestions of how to easily do this, I would be happy to look into adding these to the database.

All the best,

Brennan

partrita commented 8 months ago

Hi Taeyoon,

For PLAbDab, we scrape therapeutic antibodies from Thera-SAbDab. Thera-SAbDab is a manually curated database of therapeutic antibodies (more info on it here). As Thera-SAbDab does not contain the Fc sequence, we can not scrape it from there.

I am not aware of an automated way to extract the Fc sequence for these entries. If you have any suggestions of how to easily do this, I would be happy to look into adding these to the database.

All the best,

Brennan

Hey @brennanaba. I took some time to do web scraping. I used two different sites: https://drugs.ncats.io/ and https://gsrs.ncats.nih.gov/. The attached CSV file is the outcome. If you open the CSV file, it should be easy to understand.

web_scrap_result.csv

Looks like manual work needed for the missing sequence though.

Best regards, Taeyoon.