ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

json2txt.sh is appending an 'n' to each line in output #109

Closed rdoughty closed 4 years ago

rdoughty commented 4 years ago

Part of this script parses a json file. The json includes body_text, bibentries, and back_matter fields. After retrieving the value of those fields we attempt to add a newline. I think that is appending the offending 'n'.(see below in the sample output at the end of each 'paragraph').

https://github.com/ericleasemorgan/cord-19/blob/master/bin/json2txt.sh#L60 https://github.com/ericleasemorgan/cord-19/blob/master/bin/json2txt.sh#L64 https://github.com/ericleasemorgan/cord-19/blob/master/bin/json2txt.sh#L68

From my testing this is unnecessary.

BODY=$( echo $BODY | sed s"/$/\n/g" )

But perhaps @ericleasemorgan can provide an example of when this is necessary. I'd either remove it or maybe try...

BODY=$( echo -e $BODY | sed -e '$a\' )

current sample output example using /export/cord/json/de0163ba343bc5ea42399ae8e0b2c76c229a05b6.json:

Varicella (chicken pox) caused by varicella zoster virus (VZV) is an extremely common illness in childhood and usually results in complete recovery and lifelong immunity. Cardiac complications are exceedingly rare [1] , but myocarditis due to VZV is a severe, potentially life-threatening disorder. In immunocompromised children, varicella can be a severe disease and can result in serious complications and death [2] . We report a case of fatal varicella myocarditis in a child with Down syndrome.n A 12-year-old male child with Down syndrome was referred to the Pediatric Emergency Services of Nizwa Hospital, Oman, with history of fever and cough of three-day duration and severe chest discomfort, orthopnea and progressive dyspnea of one-day duration. He had been diagnosed at birth as Down syndrome with severe congenital heart disease (atrioventricular septal defect-AVSD), leading to congestive heart failure by the end of the first week of life. He had improved after treatment for heart failure, and had undergone successful corrective cardiac surgery for AVSD at four months of age at a tertiary cardiology service. Subsequently, with regular follow-up visits, his medications for heart failure had been gradually tapered and withdrawn. By one year of age, his cardiomegaly had disappeared and he had normal hemodynamic status and effort tolerance. During the past 10 years, he had been regularly monitored, whereby his growth parameters were in the normal centiles as per Down syndrome-specific growth charts and his thyroid status was normal. He had intermittent asthma. There was no history of systemic vasculitis or family history of ischemic heart disease or premature death. His immunization status V C The Author [2016] . Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com was up-to-date, except that he had not received varicella vaccine, as it was not part of the National Immunization Programme (NIP) in Oman until 2011.n On examination, he was afebrile, extremely restless and diaphoretic. He had cool extremities; his heart rate was 120 beats/min, respiratory rate 40 breaths/min and blood pressure 86/44 mm Hg; and the capillary refill time was more than three seconds. On auscultation, S 1 and S 2 were muffled and S 3 gallop was heard at the apex. A Grade 2/6 systolic murmur was audible along the left sternal border with no pericardial rub. There was no wheeze, but a few bilateral basal rales were heard. The liver edge was palpable 5 cm below the right costal margin. After resuscitation in the paediatric emergency room, he was admitted to the paediatric intensive care unit under the differential diagnosis of acute coronary syndrome and acute myocarditis.n

ericleasemorgan commented 4 years ago

Ryan, I am unable to reproduce the "error"; I am unable to get output containing the offending "n". Maybe we are dealing with an operating system specific issue?

rdoughty commented 4 years ago

Could be. That would make sense. I am running this locally on macOS. Are you running this on a *nix system?

ericleasemorgan commented 4 years ago

Yes, I ran my example on our remote cluster, and it runs Centos.

ericleasemorgan commented 4 years ago

I believe we are experiencing an operating system issue, and thus, this is a non-issue.

More specifically, I downloaded json2txt.sh, cord.db, and a sample JSON file (de0163ba343bc5ea42399ae8e0b2c76c229a05b6.json). I then ran:

./json2txt.sh de0163ba343bc5ea42399ae8e0b2c76c229a05b6.json

I got the offending output, and I'm running this on my Macintosh too.

I think sed is not the same on my (our) Macintosh as it is on Centos. We don't need to make this work on Macintosh (yet).

If you are comfortable with it, I think we can chalk this up to an operating system specific thing, and we can close this issue.

What do you think?

rdoughty commented 4 years ago

Sounds good. Closing!