Open llvll0hsen opened 11 years ago
By 'using pdf files', what exactly do you mean? Can you describe your intended use case and we can help you to reach a solution?
Sorry for bad explanation.I need to extract refrenceses from scientific papers and the parse them. Is it possible with this library ?
On Wednesday, October 9, 2013, Cris Ewing wrote:
By 'using pdf files', what exactly do you mean? Can you describe your intended use case and we can help you to reach a solution?
— Reply to this email directly or view it on GitHubhttps://github.com/collective/bibliograph.rendering/issues/1#issuecomment-25988857 .
Mohsen
ahh, understood.
No this library does not provide any bibliography parsing. There is a partner library here on github bibliograph.parsing, which does parse bibliographic entries. It returns a data structure suitable for interfacing with (Products.CMFBibliographyAT)[https://github.com/collective/Products.CMFBibliographyAT) but parsing entries from raw pdf text is not supported by that library either.
If you are looking to parse bibliographic citations from raw text (you don't have entries in medline, bibtex, endnote or other well-formed structure), I suggest you take a look at some of the other libraries that support this type of work. Given the astonishing variety of styles for citations in the wild, reliably parsing raw text entries is a black art. There are a few libraries that claim to support it with some reasonable level of accuracy. A quick google search for parsing bibliographic entries shows a bunch of results, some of which point to libraries that support raw text parsing. The links also point to a bunch of articles explaining what a tough problem this actually is, and why 100% accuracy is not possible.
Thanks Cris. I know there is Parcite and freecite which seem to be helpful but I was looking for some python library. But I guess at the can not escape from writing perl :)
On Wed, Oct 9, 2013 at 8:21 PM, Cris Ewing notifications@github.com wrote:
ahh, understood.
No this library does not provide any bibliography parsing. There is a partner library here on github bibliograph.parsinghttps://github.com/collective/bibliograph.parsing, which does parse bibliographic entries. It returns a data structure suitable for interfacing with (Products.CMFBibliographyAT)[ https://github.com/collective/Products.CMFBibliographyAT) but parsing entries from raw pdf text is not supported by that library either.
If you are looking to parse bibliographic citations from raw text (you don't have entries in medline, bibtex, endnote or other well-formed structure), I suggest you take a look at some of the other libraries that support this type of work. Given the astonishing variety of styles for citations in the wild, reliably parsing raw text entries is a black art. There are a few libraries that claim to support it with some reasonable level of accuracy. A quick google search for parsing bibliographic entrieshttp://www.google.com/search?q=parse+bibliographic+referencesshows a bunch of results, some of which point to libraries that support raw text parsing. The links also point to a bunch of articles explaining what a tough problem this actually is, and why 100% accuracy is not possible.
— Reply to this email directly or view it on GitHubhttps://github.com/collective/bibliograph.rendering/issues/1#issuecomment-25995185 .
Mohsen
This issue can be closed?
yes John.- cheers, Mohsen
On Thu, Feb 11, 2016 at 5:02 AM, John Vandenberg notifications@github.com wrote:
This issue can be closed?
— Reply to this email directly or view it on GitHub https://github.com/collective/bibliograph.rendering/issues/1#issuecomment-182693147 .
Mohsen
Hi,
Can you provide any sample for using pdf files? I really cant figure out how to use the library?
thanks