LibraryCarpentry / week-four-library-carpentry--DEPRECATED

Week Four lesson
http://librarycarpentry.github.io/city-november-2015/
8 stars 6 forks source link

Lookup between two data extracts #6

Open EstherArens opened 8 years ago

EstherArens commented 8 years ago

I need to insert a title identifier into an extract of order lines from a library system. An extract of order line data gives me an order ID as unique identifier, and for each order ID an additional extract can give me the title key. How to best?

ostephens commented 8 years ago

From what we talked about in the session I understand the data you start with looks something like:

Order ID: 123456
Order number: PO03453
Order date: 2015-11-01

and

Order ID: 123456
Title ID: 98765

Is that about right?

EstherArens commented 8 years ago

Thanks for picking this up again, Owen. Really appreciated ☺

My first/main file (orders2migrate.txt) has records of the following structure * DOCUMENT BOUNDARY * .ORDR_ID. |aPO-95 .ORDR_LIBR. |aDWL .FISCAL_CYCLE. |a9899 .VEND_ID. |aHOU .ORDR_TYPE. |aFIRM .ORDR_DATE_READY. |a19980818 .ORDR_DATE_MAILED. |a19980826 .ORDR_DATE_CLAIM. |a19981211 .ORDR_DATE_CANCEL. |aNEVER .ORDR_XINFO_BEGIN. .FORMAT. |ahbk .FUNDCODE. |aP .ORDR_XINFO_END. .LINE_ITEM_BEGIN. .LINE_ITEM_ID. |a7504324004 .LINE_UNIT_PRICE. |a£56.95 .LINE_VEND_CURR. |aSTERLING .LINE_EXCHANGE. |a1.0 .LINE_COPIES. |a1 .LINE_DISCOUNT. |aTABLE .LINE_EXTEND_PRICE. |a£56.95 .LINE_MULTI_PARTS. |aN .LINE_RENEWAL_DATE. |aNEVER .LINE_REQ_USER_ID. |a0750854256 .LINE_DATE_MAILED. |a19980826 .LINE_XINFO_BEGIN. .ITEMTYPE. |aNORMAL .NOTE. |aOriginal Libertas order P-440886 .LINE_XINFO_END. .FUND_SEGMENT_BEGIN. .LINE_FUND_ID. |aMICROB .LINE_FUND_COPIES. |a1 .FUND_DATE_PAID. |a19981016 .FUND_AMT_PAID. |a£56.95 .FUND_SEGMENT_END. .DIST_SEGMENT_BEGIN. .HOLDING_CODE. |aM-NORMAL .LINE_DIST_COPIES. |a1 .DIST_DATE_RECEIVED. |a19981013 .DIST_DATE_DELIVERED. |a19981013 .DIST_PACKING_LIST. |aPACKING-676 .DIST_SEGMENT_END. .LINE_ITEM_END.

The other file (keys2insert.txt) has one line per ‘record’ i.e. 3540638172 |1|DWL|9899|PO-95|

The common identifier in both is the order ID (here “PO-95”). Via this I need to take the values of the first two columns of keys2insert.txt and insert them into the matching record in orders2migrate.txt. And these two values should be preceded respectively by .CATALOG#. |a .ORDERLINE_KEY. |a

Caveat: Not all orders (records in the first file) still have title keys & order lines (row in the second file) – these are mostly historic orders that could never be supplied and for which the title has since been removed.

Although I’ve used OpenRefine today (already useful to just look what is in a data set ☺), I’ve not tried it for this – any hints gratefully received. Thanks in advance, Esther

From: Owen Stephens [mailto:notifications@github.com] Sent: 30 November 2015 22:52 To: LibraryCarpentry/week-four-library-carpentry Cc: Arens, Esther A. Subject: Re: [week-four-library-carpentry] Lookup between two data extracts (#6)

From what we talked about in the session I understand the data you start with looks something like:

Order ID: 123456

Order number: PO03453

Order date: 2015-11-01

and

Order ID: 123456

Title ID: 98765

Is that about right?

— Reply to this email directly or view it on GitHubhttps://github.com/LibraryCarpentry/week-four-library-carpentry/issues/6#issuecomment-160787269.

ostephens commented 8 years ago

@EstherArens probably a few approaches, but the first one that springs to mind:

At this point, you should have a line based file with the two new data pieces added. Remove the additional column you added, then export as some kind of delimited file (no delimiter will be used as only one col)

Does that make sense?