epam / NGB

New Genome Browser (NGB) - a Web - based NGS data viewer with unique Structural Variations (SVs) visualization capabilities, high performance, scalability, and cloud data support
MIT License
161 stars 41 forks source link

[Parasite Targets] Patent search #1191

Open mzueva opened 6 months ago

mzueva commented 6 months ago

Background

Approach

Other options

okolesn commented 6 months ago

Currently we get protein sequences by sequence ID from NCBI. For parasites targets we usually don't have NCBI data. So we get proteins secuences from data regitered locally:

  1. protein fasta files associated with references, registered in NGB,
  2. reconstructed amino-acids by reference genomes CDS nucleotides

Therefore, the approach is:

  1. if sequence protein already has field with aimno-acids, we don't have to call server api, we already have enougth information

    Example
    "sequences": [
        {
          "protein": {
            "name": "MCP9258909.1",
            "length": 554,
            "baseString": "MANILKEIPRAVINIFQYTSIHVFTDGPPKHMQRQFLCETESDNVLDFCQIKRSPIKTMAIPKLELLAILIGVRAAQFVIKQLEFENAQVILWSDSRCALHWIQNHSRLLQRFIQNRVEEIRKAKFAYRYIPSECNPANIATKAISPSDLANLTLCDNEETIDEEREQVVVTAIQEATKTSIRFVDANRFSNWSRMVRTTGIQITPYEYEFAVELLLRQAQSEGLSVEEITKRNLYYVMGLWKFKGRLQFPSSGSCISYLTYLPRHNRITEIIIQTYHEKIHHGDIPHTISELRRLYWIPKERAEVKKKAKSFKLPPMPDYHDSRTVRSKIFARIGLDYLGPVTAKTEVGMAKKSFLTALRKFVARRDCPELILSDNASQFHLIYRTIKKQESQLSNFLTSKGIIWKYITQKAPWSGGIYERIVGITKGAFRKAVDEYLNSLRERTQIEHKSPRGAITRSPSLGLINEPHIPRGMWKLAKINKLNKSSDGNVRSVQIELPFGKLLNRQVNMLYPLEAEQEDQPEDSVTELMDAKDEEPIARRTQVQQRSYELQL"
          }
        }
      ]
  2. if there is no proteins for specified gene, but there are mrna sequences without ids

Example
{
      "geneId": "vbb35395.1",
      "reference": {
        "id": "34",
        "name": "GCA_900537255"
      },
      "sequences": [
        {
          "mrna": {
            "name": "vbb35395.1",
            "begin": 8,
            "end": 689,
            "strand": "NEGATIVE",
            "featureFileId": 34,
            "chromosomeId": 6223
          }
        },
        {
          "mrna": {
            "name": "vbb35395.1",
            "begin": 861,
            "end": 964,
            "strand": "NEGATIVE",
            "featureFileId": 34,
            "chromosomeId": 6223
          }
        }
      ]
    }

, we should use POST /restapi/sequence/local endpoint with request body

{
  "database": "PROTEIN",
  "referenceId": ?,
  "featureFileId": ?,
  "chromosomeId": ?,
  "begin": ?,
  "end": ?
}

where referenceId = reference.id, featureFileId = featureFileId, chromosomeId = chromosomeId, begin = begin, end = end