Closed CathyHajo closed 9 years ago
The journals dropdown has these items:
>
Birth Control: Hearings Before a Subcommittee of the >
Report of the > , > , London, July 11 to 14, 1922.
There are a few things in the parse logs that look unusual, and I think they might be causing this. In 236494, the log says Processing title: The Journal of the >
. I'm guessing this is from this line in the XML:
<title type="journal">The Journal of the <org>American Medical Association</org></title>
So I think what may be the issue here is that the parser for mentioned entities can't handle nested mentioned things. I'll look into this.
I have been cleaning up the titles where I can, removing the nested portions from titles. But the drop down list doesn't change after I update them.
I think I fixed this, actually, so it should be able to handle nested tags in titles now. Just make sure you pull in my recent commits to your local copy. Let me know if everything looks OK.
Hi Jon,
I can't tell if it fixed it-- the drop downs look the same.
Cathy
Cathy Moran Hajo, Ph.D. Associate Editor/Assistant Director The Margaret Sanger Papers Project New York University, Division of Libraries 838 Broadway, Suite 504 New York, NY 10003-4218 (212) 998-8666 cathy.hajo@nyu.edu
Visit our website at: http://www.nyu.edu/projects/sanger
On Mon, Aug 3, 2015 at 10:18 AM, Jonathan Reeve notifications@github.com wrote:
I think I fixed this, actually, so it should be able to handle nested tags in titles now. Just make sure you pull in my recent commits to your local copy. Let me know if everything looks OK.
— Reply to this email directly or view it on GitHub https://github.com/JonathanReeve/sanger/issues/84#issuecomment-127258153 .
That's no good. Could you paste a screenshot of the problem on this issue's GitHub page (https://github.com/JonathanReeve/sanger/issues/84), along with the URL you're looking at?
Here's a screenshot-- I fixed the titles with the curly brackets and the one with the < as a title a long time ago.
Oh, here's one where the title drop down shows.
Hm, I noticed that in some cases there are two copies of XML documents, one in xml_added, and one in xml_queue. (See, for instance: https://github.com/JonathanReeve/sanger/blob/master/xml_added/008951.xml and https://github.com/JonathanReeve/sanger/blob/master/xml_queue/008951.xml. Maybe the parse script is parsing the one from xml_queue, but your corrections were to a file in xml_added?
Hi Jon,
I have been adding them to both directories each time. Is that not right?
Cathy
Cathy Moran Hajo, Ph.D. Associate Editor/Assistant Director The Margaret Sanger Papers Project New York University, Division of Libraries 838 Broadway, Suite 504 New York, NY 10003-4218 (212) 998-8666 cathy.hajo@nyu.edu
Visit our website at: http://www.nyu.edu/projects/sanger
On Mon, Aug 3, 2015 at 11:21 AM, Jonathan Reeve notifications@github.com wrote:
Hm, I noticed that in some cases there are two copies of XML documents, one in xml_added, and one in xml_queue. (See, for instance: https://github.com/JonathanReeve/sanger/blob/master/xml_added/008951.xml and https://github.com/JonathanReeve/sanger/blob/master/xml_queue/008951.xml. Maybe the parse script is parsing the one from xml_queue, but your corrections were to a file in xml_added?
— Reply to this email directly or view it on GitHub https://github.com/JonathanReeve/sanger/issues/84#issuecomment-127275342 .
The search by journal drop down box is malfunctioning. It still works on the old site.