adsabs / ADSImportPipeline

Data ingest pipeline for ADS classic->ADS+
GNU General Public License v3.0
1 stars 12 forks source link

Have direct ingest send url link #224

Closed spacemansteve closed 5 years ago

spacemansteve commented 5 years ago

Direct ingest must send the url to the arxiv paper to master. Without this, we risk not having a link on Bumblebee when the myADS email is sent.

spacemansteve commented 5 years ago

Direct ingest must send the url to the arxiv paper to master It must be sent so master can merge it with links data from the nonbib pipeline.

Direct ingest could send a DataLinksRow or DataLinksRowList protobuf. Since nonbib does not send this record (it only sends a NonBibRecord), master will know where this message originated. Master could store this field in a new column in the Records table. These links would be merged into the Solr record.

Other options include:

romanchyla commented 5 years ago

if i understand correctly:

direct ingest pipeline is producing information that normally is generated by two distinct pipelines

you say 'master must merge this info with other links coming from nonbib pipeline'

is there another pipeline involved in generating data for this field? If it was, we clearly will need new 'origin' - and won't be able to rely on the type of the protobuf; though this doesn't seem to be the case yet

this would be our preferred way:

new column in the table is not what we want; this datapoint seems not to justify that expense (timestamp etc) - can the decision be made simply using some heuristic rules of nature as