cldf / pycldf

python package to read and write CLDF datasets
https://cldf.clld.org
Apache License 2.0
15 stars 7 forks source link

Source.text is naïve and not configurable #47

Closed Anaphory closed 5 years ago

Anaphory commented 6 years ago

The conversion method to string uses the text method from clldutils. That method is quite naïve and not extensible, and it makes sense that it is like that. However, here https://github.com/glottobank/pycldf/blob/aef54564b980006222e8814266ecf46cec680668/pycldf/sources.py#L36 we have already imported pybtex, and if I understand its docs correctly, there is some way to use that package to nicely format bib entries. I don't understand how to do it, but I would suggest that pycldf.sources.Source should overload the text method, giving it an optional argument which is some description of a bibliography style and formatting this source accordingly.

xrotwang commented 6 years ago

The text serialization of BibTeX records implemented in clldutils is supposed to follow the rules proposed by @haspelmath here https://www.frank-m-richter.de/freescienceblog/2015/03/18/how-to-make-linguistics-publication-more-efficient-use-discipline-wide-style-rules/

You are right that once pybtex is in the picture, formatting according to any BibTeX style is possible. But I'm not really sure this is something either pycldf or clld should provide support for. I rather hope, people use tools like Zotero, which will give them access to even more formatting styles, not limited to what's implemented for BibTeX.

Anaphory commented 6 years ago

(1) There is a Bib style for the “Unified Style Sheet for Linguistics”, which should then be the default value for that optional argument. (2) I may want to fiddle with the bibstyle to make it (not some other piece of the chain, because it's fine for everything else) deal with sources of the “fieldnotes” genre. This is more transparent with an explicit style (which would even inherit most of its formatting from the ussl) than with overloading a method.

xrotwang commented 6 years ago

I don't really understand where exactly you'd want to serialize BibTeX records differently. Is this within a pure python toolchain, without the option to "shell out" to Zotero for custom serialization? I'm a bit worried by functionality which depends on both

because for both it's somewhat unclear how long they will be around - and re-implementing any of what they do now is clearly out of scope for pycldf.

Anaphory commented 6 years ago

I work on the CLDF-to-CLLD import for LexiRumah, so it is indeed a pure python chain, which I wouldn't know how to shell out to Zotero, and where pycldf is a core component of the tool chain.

I think that an object like Keraf, 1978 or Holton, 2010: Kiraman should display its reference in the usual human-readable format, following USSL.

I am happy to use any reasonable way to do it. Given that you say BibTeX+pybtex is not reasonable, I'm happy to not use that. I am not sure whether Choi, Hannah. 2015. Field notes on Sawila. https://hdl.handle.net/1839/00-0000-0000-001E-2E27-4@view (The Language Archive). (Accessed on 2017-11-01) is actually a good specification for field notes available through TLA (that one isn't yet, the link leads somewhere else).

I would much prefer if

I will use the existing .text() for now, because some of the ideas I have cannot be fulfilled by our current state of bibliography anyway. But I intend to come back to this to resolve this one way another.

xrotwang commented 6 years ago

Ok. I see. But this use case would be better addressed by fleshing out the USSL support in clldutils, because it seems you'd only need to override to add "better" formatting, not to provide alternative formatting, right?

Anaphory commented 6 years ago

In principle yes – Would you not consider it an issue then that the clldutils implementation of USSL will behave differently for those items than other implementations?

xrotwang commented 6 years ago

Which other implementations?

Am 01.11.2017 17:31 schrieb "Gereon Kaiping" notifications@github.com:

In principle yes – Would you not consider it an issue then that the clldutils implementation of USSL will behave differently for those items than other implementations?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/glottobank/pycldf/issues/47#issuecomment-341160286, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1HKEp1z_64kLdd1DdYAUcdrc3pFdqyks5syJzLgaJpZM4QN8I2 .

Anaphory commented 6 years ago

https://www.linguisticsociety.org/celxj has the Bibstyle linked above and a CSL version for Zotero etc.

xrotwang commented 6 years ago

I see. But why would your desired behaviour differ? I assumed clldutils was just incomplete for most entry types.

Am 01.11.2017 17:34 schrieb "Gereon Kaiping" notifications@github.com:

https://www.linguisticsociety.org/celxj has the Bibstyle linked above and a CSL version for Zotero etc.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/glottobank/pycldf/issues/47#issuecomment-341161330, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1HKEfg3asDe1X0nlkcHnfKJoqiov2Vks5syJ2SgaJpZM4QN8I2 .

SimonGreenhill commented 6 years ago

Can't you just pass it through pybtex to generate whichever format you want?

xrotwang commented 5 years ago

@Anaphory I think it would make more sense to add this functionality to clldutils, directly - even if this means adding pybtex as dependency. After all, the Unified Style Sheet is something specific to Linguistics, and clldutils is supposed to provide python utils for Linguistics.

xrotwang commented 5 years ago

@Anaphory Just tried to replace Source.text with the output of pybtex run with unified.bst. Unfortunately the result is not simply a string linearization of the citation, but a full-blown LaTeX bibliography:

E           AssertionError: assert '\\begin{theb...bliography}\n' == 'Dayley, Jon P...fornia Press.'
E             + Dayley, Jon P. 1985. Tzutujil Grammar. (University of California Publications in Linguistics, 107.) Berkeley: University of California Press.
E             - \begin{thebibliography}{1}
E             - \providecommand{\natexlab}[1]{#1}
E             - \providecommand{\url}[1]{#1}
E             - \providecommand{\urlprefix}{}
E             - \expandafter\ifx\csname urlstyle\endcsname\relax
E             -   \providecommand{\doi}[1]{doi:\discretionary{}{}{}#1}\else
E             -   \providecommand{\doi}{doi:\discretionary{}{}{}\begingroup
E             -   \urlstyle{rm}\Url}\fi
E             - 
E             - \bibitem[{Dayley(1985)}]{Dayley-1985}
E             - Dayley, Jon~P. 1985.
E             - \newblock \emph{Tzutujil grammar}, vol. 107 University of California
E             -   Publications in Linguistics.
E             - \newblock Berkeley: University of California Press.
E             - 
E             - \end{thebibliography}

So using pybtex would require postprocessing its output, which I'd consider out-of-scope for both, pycldf and clldutils.

Anaphory commented 5 years ago

Makes sense!