adsabs / ADSImportPipeline

Data ingest pipeline for ADS classic->ADS+
GNU General Public License v3.0
1 stars 12 forks source link

wrong data coming in from the pipeline #162

Closed romanchyla closed 6 years ago

romanchyla commented 7 years ago

a recent example

 {"alternate_bibcode": ["2003adass..12..283B"], "doctype_facet_hier": ["0/Article", "1/Article/Proceedings Article"], "pubdate": "2003-00-00", "first_author": "Blecksmith,
 E.", "abstract": "In order to support regular operations, the Chandra Data Archive Operations Group has developed a database that records and monitors the user activities that affect the archive servers. This da
tabase provides information on the number of users that are connected at a given time, what archive interfaces they use (we have several), and how much and what type of data is being downloaded. The database cons
ists of three tables populated by a set of four scripts that parse the archive server logs, the ftp logs and the login logs. User activity can be tracked through each of those logs, making information from a give
n connection easily accessible. With this tool, the Archive Operations Group will be able to gather statistics and monitor trends, which will improve the accessibility of Chandra data.", "links_data": ["{\"access
\": \"open\", \"instances\": \"\", \"title\": \"\", \"type\": \"gif\", \"url\": \"http://articles.adsabs.harvard.edu/full/2003ASPC..295..283B\"}", "{\"access\": \"open\", \"instances\": \"\", \"title\": \"\", \"t
ype\": \"article\", \"url\": \"http://articles.adsabs.harvard.edu/full/2003ASPC..295..283B?defaultprint=YES\"}"], "date": "2003-01-01T00:00:00.000000Z", "year": "2003", "id": "1401492", "bibcode": "2003ASPC..295.
.283B", "bibgroup": ["CXC", "CfA"], "author_facet_hier": ["0/Blecksmith, E", "1/Blecksmith, E/Blecksmith, E.", "0/Paltani, S", "1/Paltani, S/Paltani, S.", "0/Rots, A", "1/Rots, A/Rots, A.", "0/Winkelman, S", "1/W
inkelman, S/Winkelman, S."], "author": ["Blecksmith, E.", "Paltani, S.", "Rots, A.", "Winkelman, S."], "aff": ["-", "-", "-", "-"], "orcid_pub": ["-", "-", "-", "-"], "email": ["-", "-", "-", "-"], "bibgroup_face
t": ["CXC", "CfA"], "bibstem_facet": "ASPC", "pub": "Astronomical Data Analysis Software and Systems XII", "pub_raw": "Astronomical Data Analysis Software and Systems XII ASP Conference Series, Vol. 295, 2003 H. 
E. Payne, R. I. Jedrzejewski, and R. N. Hook, eds., p.283", "volume": "295", "author_count": 4, "first_author_norm": "Blecksmith, E", "property": ["OPENACCESS", "ADS_OPENACCESS", "ARTICLE", "NOT REFEREED"], "data
base": ["astronomy"], "bibstem": ["ASPC", "ASPC..295"], "doctype": "inproceedings", "page": ["283"], "first_author_facet_hier": ["0/Blecksmith, E", "1/Blecksmith, E/Blecksmith, E."], "title": ["Chandra Data Archi
ve Download and Usage Database"], "identifier": ["2003adass..12..283B"], "author_facet": ["Blecksmith, E", "Paltani, S", "Rots, A", "Winkelman, S"], "author_norm": ["Blecksmith, E", "Paltani, S", "Rots, A", "Wink
elman, S"]} | {"unverified": ["-", "-", "0000-0003-2377-2356", "-"], "bibcode": "2003ASPC..295..283B", "authors": ["Blecksmith, E.", "Paltani, S.", "Rots, A.", "Winkelman, S."]} | {"read_count": 3, "bibcode": "20
03ASPC..295..283B", "downloads": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 0, 0, 2, 1, 0, 0, 1, 3], "citation_count": 1, "norm_cites": 1505, "reads": [0, 0, 0, 0, 0, 0, 0, 9, 3, 4, 4, 5, 4, 5, 1, 0, 3, 1, 1, 0, 
4, 7], "citations": ["2006ASPC..351...93W"], "authors": ["Blecksmith, E", "Paltani, S", "Rots, A", "Winkelman, S"], "boost": 0.15000000596046448, "id": 6556962} |          | 2017-09-09 04:54:34.546038 | 2017-09-1
1 02:45:28.884952 | 2017-09-07 21:48:31.626193 |                  | 2017-09-07 21:48:31.626581 | 2017-09-11 02:45:28.884952 |           |         | 
(1 row)

reads, downloads, norm_cites, citation_count should not be here