Open artydont opened 1 year ago
Gollobin bibliography is a good initial choice as it will certainly be useful for many things and will later need to be linked in both directions to citations in Gollobin Notes and names and topics is Index.
While waiting for a Wordprocessor OCR file that includes full bibliography try out doing next small batch from Maksakovsky. First small batch above will be renamed "mak.bib.01.txt" and next batch should be in same format with suffixes bib.02.txt and fmt.02.txt for the two output files. The small numbered batches can easily be combined later with above name scheme. Two digits used to ensure that large bibliography can have more than 9 batches.
Starting point for all later processing of anything is extracting the references into a standard .bib file plus the additional details format file ('fmt") offered from use of https://anystyle.io/ as mentioned above which has full explanations to learn by doing. eg Start with next batch after above from Maksakovsky for practice while waiting for Gollobin.
Proper tutorial for anybody to do this should be written after experience with learning how to do it here.
Meanwhile first steps are:
Can simply attach as comments added to this Issue like above anystyle.bib.txt that will get deleted after we work out a system for proper processing and storing.
Both the person uploading and anybody else can check against originals by using wordprocessor to print the formatted bibliography in the same "style" as was used in original and comparing with that section of original. Also note the style to include in provenance details for finalized result of all batches.
Remaining errors can be corrected after importing batch to a Zotero subcollection or other citation editor. This is better than editing the .bib or detailed format file directly in a normal text editor.
Later it will be used to add missing fields such as LCCN, ISBN, DOI, hdl, md5 etc by actually tracking down each item to a public catalog entry and eventually to a URN that can be used to directly access the item for automated retrieval.
All later steps after initial extraction will have to be worked out as we proceed so that they become simple tutorials plus automated software for others to use.
But initial extraction using anystyle.io online is certainly the first step.
It seems that I did receive this but I didn't see it at the time. I only found it just now by searching for the title "Extracting citations" in Outlook. I don't know if was in my In box, (and I missed it) or went to Junk mail. Craig
From: artydont @.> Sent: Tuesday, 24 October 2023 12:11 AM To: ScientificPublishing/SciPub @.> Cc: Ted1307 @.>; Mention @.> Subject: Re: [ScientificPublishing/SciPub] Extracting Citations (Issue #3)
Gollobin bibliography is a good initial choice as it will certainly be useful for many things and will later need to be linked in both directions to citations in Gollobin Notes and names and topics is Index.
While waiting for a Wordprocessor OCR file that includes full bibliography try out doing next small batch from Maksakovsky. First small batch above will be renamed "mak.bib.01.txt" and next batch should be in same format with suffixes bib.02.txt and fmt.02.txt for the two output files. The small numbered batches can easily be combined later with above name scheme. Two digits used to ensure that large bibliography can have more than 9 batches.
Starting point for all later processing of anything is extracting the references into a standard .bib file plus the additional details format file ('fmt") offered from use of https://anystyle.io/ as mentioned above which has full explanations to learn by doing. eg Start with next batch after above from Maksakovsky for practice while waiting for Gollobin.
Proper tutorial for anybody to do this should be written after experience with learning how to do it here.
Meanwhile first steps are:
Can simply attach as comments added to this Issue like above anystyle.bib.txthttps://github.com/ScientificPublishing/SciPub/files/12818736/anystyle.bib.txt that will get deleted after we work out a system for proper processing and storing.
Both the person uploading and anybody else can check against originals by using wordprocessor to print the formatted bibliography in the same "style" as was used in original and comparing with that section of original. Also note the style to include in provenance details for finalized result of all batches.
Remaining errors can be corrected after importing batch to a Zotero subcollection or other citation editor. This is better than editing the .bib or detailed format file directly in a normal text editor.
Later it will be used to add missing fields such as LCCN, ISBN, DOI, hdl, md5 etc by actually tracking down each item to a public catalog entry and eventually to a URN that can be used to directly access the item for automated retrieval.
All later steps after initial extraction will have to be worked out as we proceed so that they become simple tutorials plus automated software for others to use.
But initial extraction using anystyle.io online is certainly the first step.
— Reply to this email directly, view it on GitHubhttps://github.com/ScientificPublishing/SciPub/issues/3#issuecomment-1776251119, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCJT7UNL56WK47Z6BM2LQWLYA4BSVAVCNFSM6AAAAAA5UIJ6LCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZWGI2TCMJRHE. You are receiving this because you were mentioned.
I just googled for title above and tried out the recommendation to use:
https://anystyle.io/
made in:
https://update.lib.berkeley.edu/2018/02/07/extracting-references-from-an-already-created-bibliography/
and in linked Zotero doc:
https://www.zotero.org/support/kb/importing_formatted_bibliographies
Produced attached Bibtext .bib file from sample of first 8 entries cut and pasted from Maksakovsky pdf.
Suffix .txt added to filename to attach here.
anystyle.bib.txt
Suggest @DavidMc1948 should try it out on whole of both bibliographies in Maksakovsky. (Note I modified some entries which it may now handle automatically for [no date] and [Moscow no publisher].
Also suggest @Ted1307 should do as much as feasible from large Gollobin bibliography in small batches to upload here for checking there is no problem.
I used a small batch and had to fix the line breaks so it would recognize there were 8 items.
PS See also new comment re Joplin I am about to add to Markdown Editor Issue