WeiliWw / VirHostMatcher-Net

VirHostMatcher-Net: A network-based computational tool for predicting virus-host interactions.
19 stars 1 forks source link

Wondering how you collected GCF IDs of host in NCBI? #21

Closed Achuan-2 closed 2 years ago

WeiliWw commented 2 years ago

You can check it on NCBI, like this one: https://www.ncbi.nlm.nih.gov/nuccore/NC_004722.1.

Achuan-2 commented 2 years ago

oh, thanks, I found a method which using biopython

def search_assembly(term):
    """Search NCBI assembly database for a given search term.
    Args:
        term: search term, usually organism name
    """
    handle = Entrez.esearch(db="assembly", term=term, retmax='200')
    record = Entrez.read(handle)
    ids = record['IdList']
    return ids

def get_assembly_summary(id):
    """Get esummary for an entrez id"""
    esummary_handle = Entrez.esummary(db="assembly", id=id, report="full")
    esummary_record = Entrez.read(esummary_handle)
    return esummary_record
Achuan-2 commented 2 years ago

But I need to solve a problem,Some host names cannot find assembly genome,such as Streptomyces scabiei RL-34. I would like to ask you how to solve this problem?

The solution I came up with is that if a host name cannot be found a assembly genome, just use speices or genus level data? how did you get the taxonomy lineage?