cew821 / nasa_patent_parser

A tool to extract patent information from technology.nasa.gov
3 stars 0 forks source link

process (unstructured) patent descriptions into key concepts [NLP] #1

Open danhammer opened 9 years ago

danhammer commented 9 years ago

Hi Charles!

This is awesome. Maybe worth adding a small layer on top to process the entries into key concepts to find out if there are connections to be made between the NASA centers (Ames, JPL, Johnson, etc.). Jeff Chen is keen on this sort of interactive network graph. Action items:

  1. Process the CSV into a hosted elasticsearch instance.
  2. Apply some sort of NLP layer (like Alchemy API) to grab key words
  3. Serve it out again as a RESTful API so that Jeff (or others) can grab what they need for visualizations.

What do you think?

cew821 commented 9 years ago

Hey Dan - Sounds like a good idea. I don't have any specific knowledge about this subject matter domain, but I'd be happy to help implement something if you had something specific in mind.

I kept hearing that it was difficult-or-impossible to create an export of the NASA patent database given the current contract or lack of funds or some other reasons, and eventually I became frustrated enough with that answer that I did it in an afternoon. Hopefully this helps demonstrate to powers that be that if they are hearing this type of work is difficult or expensive, they are not being told the truth.

danhammer commented 9 years ago

It's a great project! I was just hoping to add a couple more features. I submit a PR when I get time. Hopefully it will help you make your case that, in fact, it just takes a little "get up and go."

danhammer commented 9 years ago

notably, as soon as I sit down to start this project, their internal API is down. forever loading...

cew821 commented 9 years ago

Weird. Does the site itself still work?

On Thursday, December 11, 2014, Dan notifications@github.com wrote:

notably, as soon as I sit down to start this project, their internal API is down. forever loading...

— Reply to this email directly or view it on GitHub https://github.com/cew821/nasa_patent_parser/issues/1#issuecomment-66631288 .

danhammer commented 9 years ago

nope. down. down. down.

danhammer commented 9 years ago

It is forever loading ... screenshot 2014-12-11 12 39 32

danhammer commented 9 years ago

A first attempt, which will be wrapped up RESTfully. I was able to add concept tagging and some cleanup. I've only dumped in a few of the entries because it is unclear how to call ElasticSearch from App Engine. Once I figure this out, I'll harvest all of the patents -- so that they are discoverable (searchable and concept-tagged). I'll add you to the repo, @cew821, if you are interested.

https://85453cfb9bd1d1cf07507dd2cdff0469-us-east-1.foundcluster.com:9243/tech_transfer/patent/_search?q=Temperature

{
category: "materials and coatings",
client_record_id: "patent_ARC-14661-3",
trl: "3 - Proof-of-concept",
concepts: {
0: "Metric space",
1: "Fundamental physics concepts",
2: "Temperature",
3: "Carbon",
4: "Gas",
5: "Carbon nanotube",
6: "Ionizing radiation",
7: "Topology"
},
center: "ARC",
eRelations: [ ],
reference_number: "ARC-14661-3",
expiration_date: "2025-11-05 00:00:00",
abstract: "Method and system for functionalizing a collection of carbon nanotubes (CNTs). A selected precursor gas (e.g., H.sub.2 or NH.sub.3 or NF.sub.3 or F.sub.2 or CF.sub.4 or C.sub.nH.sub.m) is irradiated to provide a cold plasma of selected target particles, such as atomic H or F, in a first chamber. The target particles are directed toward an array of CNTs located in a second chamber while suppressing transport of ultraviolet radiation to the second chamber. A CNT array is functionalized with the target particles, at or below room temperature, to a point of saturation, in an exposure time interval no longer than about 30 sec. The predominant species that are deposited on the CNT array vary with the distance d measured along a path from the precursor gas to the CNT array; two or three different predominant species can be deposited on a CNT array for distances d=d1 and d=d2>d1 and d=d3>d2.",
title: "Selective functionalization of carbon nanotubes based upon distance traveled",
contact: {
email: "Trupti.D.Sanghani@nasa.gov",
address: "Mail Stop 202A-3, Moffett Field, CA 94035",
name: "Trupti D. Sanghani",
office: "Technology Partnerships Division",
facility: "NASA Ames Research Center"
},
publication: [ ],
patent_number: "7767270",
serial_number: "11/387,503",
_id: "53f65b3d5904da2c9fc3008f",
id: "patent_ARC-14661-3",
innovator: [
{
lname: "Khare",
mname: "N.",
company: "SETI Institute",
order: "1",
fname: "Bishun"
},
{
lname: "Meyyappan",
company: "NASA Ames Research Center",
order: "2",
fname: "Meyya"
}
]
}
},