digiah / mapaca

Music and the Performing Arts in Colonial American Newspapers
http://dahi.manoa.hawaii.edu/mapaca/index.php
0 stars 0 forks source link

enter dateline #15

Open rcrath opened 9 years ago

rcrath commented 9 years ago

WIthin stories the first sentence for out of town news is generally a dateline from another paper. that is often followed by a day when the actual event took place, like following:

 July 18, 1766: Last Monday at Boston...

would it be possible to put together a perl script that pulls out the dateline (i.e. the first date)? THe second date would be harder but still maybe possible to get sometimes via creative regex.

damg70 commented 9 years ago

doable. have to do some regex research, but a quick survey of stack overflow yields promising results. others have taken a stab at this.

rcrath commented 9 years ago

@damg70, can we do nesting in SQL fields? One thing rdf/nosql has is the tuple (n1(n2,n3) or a key value pair (noSQL) where a field is a key and a value, and a value can be field. so F(k,v(k,v(...))). this ultimately fits what the search would be better. as an example, extract the dates from the following two records:

South Carolina Gazette (Crouch) ▲ ▼ YEAR: 1766 ▲ ▼ MONTH: 4 ▲ ▼ DAY: 15 ▲ ▼ CITY: Charleston ▲ ▼ COLONY: SC LOCATION: Philadelphia REGION: NY NJ PA DE Philadelphia, . . . March 27. On Monday last we received the following most agreeable intelligence, viz. That a vessel was arrived from Cork at Oxford, in Maryland, in forty days passage; the Captain of which brought a Cork news paper, in which was a paragraph taken from one printed in Dublin, containing a letter from a member of Parliament in London, to his friend in Ireland, dated about the last of January, the substance of which was; "that every thing relating to the affairs of America was settled; that the Stamp-Act was repealed; and that requisitions were to be made to the respective colonies, for the support of American establishments." These glad tidings spread a general joy all over the city; our bells were set a ringing; at night bonfires were lighted . . .

4/15/66 refers to 3/27/66 refers to "forty days" prior referring to "no date" (for Dublin paper) refering to circa 1/31/66. bonus, figure out that "at night" refers to Phil, March 27 (I think)

this one is near impossible:

South Carolina Gazette (Crouch) ▲ ▼ YEAR: 1766 ▲ ▼ MONTH: 10 ▲ ▼ DAY: 14 ▲ ▼ CITY: Charleston ▲ ▼ COLONY: SC LOCATION: Philadelphia REGION: NY NJ PA DE From the Pennsylvania Journal, Sept. 4, 1766. . . A letter from John Hughes, Esq; to the commissioners of the Stamp-Office in London. Philadelphia, October 12, 1765. . . . [4 lines, I can now give you a] sketch of my own conduct and that of the Presbyterians and proprietary party here relative to the Stamp-office. . . [Notified of his recommendation as chief distributor of stamps; received information that] a mob would be collected by beating muffled drums through the streets, and ringing the state house and church bells muffled, which was accordingly done all the afternoon. . . [2-column narration of encounter, mob demanding his resignation.] Since writing the above, I am informed that Benjamin Shoemaker, Esq; who is one of the people called Quakers, and also an Alderman of this city, met with the drummers as they were alarming the city, and took them to talk, requiring to know by what authority they were endeavouring to raise a mob? They answered, if he would go to the State-house, he might know. He then asked who ordered them to beat about the streets? . . . [9 more lines of discussion. Shoemaker eventually backs off for fear of his own property.] . . . [3 more columns of reports of stamp-act business, challenges to Hughes and his answer:] Gentlemen, I received your of the 4th instant, and cannot but infer from the contents that you are strangers in Pennsylvania, since by the tenor of your letter you seem not to be acquainted with the things that are come to pass in these our days. -- I therefore think it necessary, before I proceed in answer to it, to give you a brief detail of what has happened. First then, I am to inform you, that on Saturday the 5th of October last, the State House and Christ-Church bells were rung muffled, and two Negro drummers, one of whom belonged to Alderman Samuel Mifflin, beat through all parts of the city with muffled drums, thereby alarming the inhabitants, . . . [1/2 column, signed] John Hughes.

so this is something like this: Date 10/14/1766 refers to 9/4/66 refers to 10/12/65, "all the afternoon" of 10/12/65 AND the rest of the dates, which would be a total bear to extract any way but manually (crowdsource?)

damg70 commented 9 years ago

you can do the following:

SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);

and probably turn the second SELECT into another nested query.

i don’t think i understand what you’re asking, but i’ll give it a shot.

A) we have three fields (YEAR, MONTH, DAY) with date information that can be acted on by normal SQL operations.

B) we have date-related strings in the CITATION field that can be regexed out.

what magic do you want to perform once you’ve got A in your left hand and B in your right?

On Aug 23, 2015, at 2:21 PM, Rich Rath notifications@github.com wrote:

@damg70 https://github.com/damg70, can we do nesting in SQL fields? One thing rdf/nosql has is the tuple (n1(n2,n3) or a key value pair (noSQL) where a field is a key and a value, and a value can be field. so F(k,v(k,v(...))). this ultimately fits what the search would be better. as an example, extract the dates from the following two records:

South Carolina Gazette (Crouch) ▲ ▼ YEAR: 1766 ▲ ▼ MONTH: 4 ▲ ▼ DAY: 15 ▲ ▼ CITY: Charleston ▲ ▼ COLONY: SC LOCATION: Philadelphia REGION: NY NJ PA DE Philadelphia, . . . March 27. On Monday last we received the following most agreeable intelligence, viz. That a vessel was arrived from Cork at Oxford, in Maryland, in forty days passage; the Captain of which brought a Cork news paper, in which was a paragraph taken from one printed in Dublin, containing a letter from a member of Parliament in London, to his friend in Ireland, dated about the last of January, the substance of which was; "that every thing relating to the affairs of America was settled; that the Stamp-Act was repealed; and that requisitions were to be made to the respective colonies, for the support of American establishments." These glad tidings spread a general joy all over the city; our bells were set a ringing; at night bonfires were lighted . . .

4/15/66 refers to 3/27/66 refers to "forty days" prior referring to "no date" (for Dublin paper) refering to circa 1/31/66. bonus, figure out that "at night" refers to Phil, March 27 (I think)

this one is near impossible:

South Carolina Gazette (Crouch) ▲ ▼ YEAR: 1766 ▲ ▼ MONTH: 10 ▲ ▼ DAY: 14 ▲ ▼ CITY: Charleston ▲ ▼ COLONY: SC LOCATION: Philadelphia REGION: NY NJ PA DE From the Pennsylvania Journal, Sept. 4, 1766. . . A letter from John Hughes, Esq; to the commissioners of the Stamp-Office in London. Philadelphia, October 12, 1765. . . . [4 lines, I can now give you a] sketch of my own conduct and that of the Presbyterians and proprietary party here relative to the Stamp-office. . . [Notified of his recommendation as chief distributor of stamps; received information that] a mob would be collected by beating muffled drums through the streets, and ringing the state house and church bells muffled, which was accordingly done all the afternoon. . . [2-column narration of encounter, mob demanding his resignation.] Since writing the above, I am informed that Benjamin Shoemaker, Esq; who is one of the people called Quakers, and also an Alderman of this city, met with the drummers as they were alarming the city, and took them to talk, requiring to know by what authority they were endeavouring to raise a mob? They answered, if he would go to the State-house, he might know. He then asked who ordered them to beat about the streets? . . . [9 more lines of discussion. Shoemaker eventually backs off for fear of his own property.] . . . [3 more columns of reports of stamp-act business, challenges to Hughes and his answer:] Gentlemen, I received your of the 4th instant, and cannot but infer from the contents that you are strangers in Pennsylvania, since by the tenor of your letter you seem not to be acquainted with the things that are come to pass in these our days. -- I therefore think it necessary, before I proceed in answer to it, to give you a brief detail of what has happened. First then, I am to inform you, that on Saturday the 5th of October last, the State House and Christ-Church bells were rung muffled, and two Negro drummers, one of whom belonged to Alderman Samuel Mifflin, beat through all parts of the city with muffled drums, thereby alarming the inhabitants, . . . [1/2 column, signed] John Hughes.

so this is something like this: Date 10/14/1766 refers to 9/4/66 refers to 10/12/65, "all the afternoon" of 10/12/65 AND the rest of the dates, which would be a total bear to extract any way but manually (crowdsource?)

— Reply to this email directly or view it on GitHub https://github.com/digiah/mapaca/issues/15#issuecomment-133971494.

damg70 commented 9 years ago

It's not a recursion problem, it's a linear filter.

1) Regex that detects dates and date-ish references ("last friday," "previous monday," "last year," "40 days ago")... 2) Convert this set of date-ish references to set of calendar dates (need calendar php calendar library) 3) order this set of calendar dates from smallest to largest 4) calculate times between all dates

rcrath commented 9 years ago

@damg70 Yep, this is the ticket. The secondary problem is to attach the dates to locations where possible Thanks for walking through the linear filter thing. But the principle of easy things first suggests we start by just getting the dateline and attaching it to a location.

rcrath commented 9 years ago

Also would be good if we could get calendar dates for the pubdate (though we would need to keep the separate fields for sorting by month. If we have calendar dates it should be pretty easy to put in a day of the week lookup for them too, right?