NCBI-Hackathons / seqr

Creative Commons Zero v1.0 Universal
12 stars 2 forks source link

index command #29

Closed lianyi closed 8 years ago

lianyi commented 8 years ago

Not sure if i was looking at the right place.

seqr/src/main/java/gov/nih/nlm/ncbi/seqr/solr/SeqrController.java

   public SolrDocumentList index(String proteinSequence) {
        return null;
    }

Was this index method overridden/implemented somewhere that able to index the JSON and/or FASTA files? I saw samples in unit testing that we able to index josn files though.

averagehat commented 8 years ago

We didn't finish this at the hackathon. I am working on adding the functionality in, it's very simple. Probably I should have had it raise an error but I didn't think of that at the time.

averagehat commented 8 years ago

I am planning to just add a solrInputDocument like so:

{ "sequence" : "ACGT..."
"defline" : ">gi|foo|bar"...
}
lianyi commented 8 years ago

Got it, thanks @averagehat !

Also here is a list of fields that pre-defined at this moment: Where id is a required int field.

{
    "id": 20140829,
    "nrgi": "15676503",
    "acxn": "Q9K0J8",
    "pig": 129513,
    "taxid": "122586",
    "blastname": "b-proteobacteria",
    "sciname": "Neisseria meningitidis MC58",
    "lineage": "/root/cellular organisms/Bacteria/Proteobacteria/Betaproteobacteria/Neisseriales/Neisseriaceae/Neisseria/Neisseria meningitidis/Neisseria meningitidis serogroup B/Neisseria meningitidis MC58",
    "preftaxid": "64",
    "prefaxname": "Betaproteobacteria",
    "origdefline": "RecName: Full=Maf-like protein NMB0598 [Neisseria meningitidis MC58]",
    "defline": "RecName: Full=Maf-like protein NMB0598 ",
    "seqlen": "202",
    "sequence": "MNTLYLGSNSPRRMEILTQLGYRVIQLPAGIDESVKAGETPFAYVQRMAEEKNRTALTLFCETNGTMPDFPLITADTC"
}
averagehat commented 8 years ago

where does the "id" field come from?

lewisg-ncbi commented 8 years ago

Mike,

It's the standard ncbi integer identifier for a sequence, normally called a gi.

Best, Lewis

From: Mike Panciera [mailto:notifications@github.com] Sent: Thursday, September 03, 2015 5:46 PM To: NCBI-Hackathons/seqr seqr@noreply.github.com Subject: Re: [seqr] index command (#29)

where does the "id" field come from?

— Reply to this email directly or view it on GitHubhttps://github.com/NCBI-Hackathons/seqr/issues/29#issuecomment-137582916.

lianyi commented 8 years ago

code merged